9 resultados para Memory hierarchy design
em AMS Tesi di Dottorato - Alm@DL - Università di Bologna
Resumo:
The miniaturization race in the hardware industry aiming at continuous increasing of transistor density on a die does not bring respective application performance improvements any more. One of the most promising alternatives is to exploit a heterogeneous nature of common applications in hardware. Supported by reconfigurable computation, which has already proved its efficiency in accelerating data intensive applications, this concept promises a breakthrough in contemporary technology development. Memory organization in such heterogeneous reconfigurable architectures becomes very critical. Two primary aspects introduce a sophisticated trade-off. On the one hand, a memory subsystem should provide well organized distributed data structure and guarantee the required data bandwidth. On the other hand, it should hide the heterogeneous hardware structure from the end-user, in order to support feasible high-level programmability of the system. This thesis work explores the heterogeneous reconfigurable hardware architectures and presents possible solutions to cope the problem of memory organization and data structure. By the example of the MORPHEUS heterogeneous platform, the discussion follows the complete design cycle, starting from decision making and justification, until hardware realization. Particular emphasis is made on the methods to support high system performance, meet application requirements, and provide a user-friendly programmer interface. As a result, the research introduces a complete heterogeneous platform enhanced with a hierarchical memory organization, which copes with its task by means of separating computation from communication, providing reconfigurable engines with computation and configuration data, and unification of heterogeneous computational devices using local storage buffers. It is distinguished from the related solutions by distributed data-flow organization, specifically engineered mechanisms to operate with data on local domains, particular communication infrastructure based on Network-on-Chip, and thorough methods to prevent computation and communication stalls. In addition, a novel advanced technique to accelerate memory access was developed and implemented.
Resumo:
This dissertation concerns active fibre-reinforced composites with embedded shape memory alloy wires. The structural application of active materials allows to develop adaptive structures which actively respond to changes in the environment, such as morphing structures, self-healing structures and power harvesting devices. In particular, shape memory alloy actuators integrated within a composite actively control the structural shape or stiffness, thus influencing the composite static and dynamic properties. Envisaged applications include, among others, the prevention of thermal buckling of the outer skin of air vehicles, shape changes in panels for improved aerodynamic characteristics and the deployment of large space structures. The study and design of active composites is a complex and multidisciplinary topic, requiring in-depth understanding of both the coupled behaviour of active materials and the interaction between the different composite constituents. Both fibre-reinforced composites and shape memory alloys are extremely active research topics, whose modelling and experimental characterisation still present a number of open problems. Thus, while this dissertation focuses on active composites, some of the research results presented here can be usefully applied to traditional fibre-reinforced composites or other shape memory alloy applications. The dissertation is composed of four chapters. In the first chapter, active fibre-reinforced composites are introduced by giving an overview of the most common choices available for the reinforcement, matrix and production process, together with a brief introduction and classification of active materials. The second chapter presents a number of original contributions regarding the modelling of fibre-reinforced composites. Different two-dimensional laminate theories are derived from a parent three-dimensional theory, introducing a procedure for the a posteriori reconstruction of transverse stresses along the laminate thickness. Accurate through the thickness stresses are crucial for the composite modelling as they are responsible for some common failure mechanisms. A new finite element based on the First-order Shear Deformation Theory and a hybrid stress approach is proposed for the numerical solution of the two-dimensional laminate problem. The element is simple and computationally efficient. The transverse stresses through the laminate thickness are reconstructed starting from a general finite element solution. A two stages procedure is devised, based on Recovery by Compatibility in Patches and three-dimensional equilibrium. Finally, the determination of the elastic parameters of laminated structures via numerical-experimental Bayesian techniques is investigated. Two different estimators are analysed and compared, leading to the definition of an alternative procedure to improve convergence of the estimation process. The third chapter focuses on shape memory alloys, describing their properties and applications. A number of constitutive models proposed in the literature, both one-dimensional and three-dimensional, are critically discussed and compared, underlining their potential and limitations, which are mainly related to the definition of the phase diagram and the choice of internal variables. Some new experimental results on shape memory alloy material characterisation are also presented. These experimental observations display some features of the shape memory alloy behaviour which are generally not included in the current models, thus some ideas are proposed for the development of a new constitutive model. The fourth chapter, finally, focuses on active composite plates with embedded shape memory alloy wires. A number of di®erent approaches can be used to predict the behaviour of such structures, each model presenting different advantages and drawbacks related to complexity and versatility. A simple model able to describe both shape and stiffness control configurations within the same context is proposed and implemented. The model is then validated considering the shape control configuration, which is the most sensitive to model parameters. The experimental work is divided in two parts. In the first part, an active composite is built by gluing prestrained shape memory alloy wires on a carbon fibre laminate strip. This structure is relatively simple to build, however it is useful in order to experimentally demonstrate the feasibility of the concept proposed in the first part of the chapter. In the second part, the making of a fibre-reinforced composite with embedded shape memory alloy wires is investigated, considering different possible choices of materials and manufacturing processes. Although a number of technological issues still need to be faced, the experimental results allow to demonstrate the mechanism of shape control via embedded shape memory alloy wires, while showing a good agreement with the proposed model predictions.
Resumo:
The scale down of transistor technology allows microelectronics manufacturers such as Intel and IBM to build always more sophisticated systems on a single microchip. The classical interconnection solutions based on shared buses or direct connections between the modules of the chip are becoming obsolete as they struggle to sustain the increasing tight bandwidth and latency constraints that these systems demand. The most promising solution for the future chip interconnects are the Networks on Chip (NoC). NoCs are network composed by routers and channels used to inter- connect the different components installed on the single microchip. Examples of advanced processors based on NoC interconnects are the IBM Cell processor, composed by eight CPUs that is installed on the Sony Playstation III and the Intel Teraflops pro ject composed by 80 independent (simple) microprocessors. On chip integration is becoming popular not only in the Chip Multi Processor (CMP) research area but also in the wider and more heterogeneous world of Systems on Chip (SoC). SoC comprehend all the electronic devices that surround us such as cell-phones, smart-phones, house embedded systems, automotive systems, set-top boxes etc... SoC manufacturers such as ST Microelectronics , Samsung, Philips and also Universities such as Bologna University, M.I.T., Berkeley and more are all proposing proprietary frameworks based on NoC interconnects. These frameworks help engineers in the switch of design methodology and speed up the development of new NoC-based systems on chip. In this Thesis we propose an introduction of CMP and SoC interconnection networks. Then focusing on SoC systems we propose: • a detailed analysis based on simulation of the Spidergon NoC, a ST Microelectronics solution for SoC interconnects. The Spidergon NoC differs from many classical solutions inherited from the parallel computing world. Here we propose a detailed analysis of this NoC topology and routing algorithms. Furthermore we propose aEqualized a new routing algorithm designed to optimize the use of the resources of the network while also increasing its performance; • a methodology flow based on modified publicly available tools that combined can be used to design, model and analyze any kind of System on Chip; • a detailed analysis of a ST Microelectronics-proprietary transport-level protocol that the author of this Thesis helped developing; • a simulation-based comprehensive comparison of different network interface designs proposed by the author and the researchers at AST lab, in order to integrate shared-memory and message-passing based components on a single System on Chip; • a powerful and flexible solution to address the time closure exception issue in the design of synchronous Networks on Chip. Our solution is based on relay stations repeaters and allows to reduce the power and area demands of NoC interconnects while also reducing its buffer needs; • a solution to simplify the design of the NoC by also increasing their performance and reducing their power and area consumption. We propose to replace complex and slow virtual channel-based routers with multiple and flexible small Multi Plane ones. This solution allows us to reduce the area and power dissipation of any NoC while also increasing its performance especially when the resources are reduced. This Thesis has been written in collaboration with the Advanced System Technology laboratory in Grenoble France, and the Computer Science Department at Columbia University in the city of New York.
Resumo:
Nowadays, computing is migrating from traditional high performance and distributed computing to pervasive and utility computing based on heterogeneous networks and clients. The current trend suggests that future IT services will rely on distributed resources and on fast communication of heterogeneous contents. The success of this new range of services is directly linked to the effectiveness of the infrastructure in delivering them. The communication infrastructure will be the aggregation of different technologies even though the current trend suggests the emergence of single IP based transport service. Optical networking is a key technology to answer the increasing requests for dynamic bandwidth allocation and configure multiple topologies over the same physical layer infrastructure, optical networks today are still “far” from accessible from directly configure and offer network services and need to be enriched with more “user oriented” functionalities. However, current Control Plane architectures only facilitate efficient end-to-end connectivity provisioning and certainly cannot meet future network service requirements, e.g. the coordinated control of resources. The overall objective of this work is to provide the network with the improved usability and accessibility of the services provided by the Optical Network. More precisely, the definition of a service-oriented architecture is the enable technology to allow user applications to gain benefit of advanced services over an underlying dynamic optical layer. The definition of a service oriented networking architecture based on advanced optical network technologies facilitates users and applications access to abstracted levels of information regarding offered advanced network services. This thesis faces the problem to define a Service Oriented Architecture and its relevant building blocks, protocols and languages. In particular, this work has been focused on the use of the SIP protocol as a inter-layers signalling protocol which defines the Session Plane in conjunction with the Network Resource Description language. On the other hand, an advantage optical network must accommodate high data bandwidth with different granularities. Currently, two main technologies are emerging promoting the development of the future optical transport network, Optical Burst and Packet Switching. Both technologies respectively promise to provide all optical burst or packet switching instead of the current circuit switching. However, the electronic domain is still present in the scheduler forwarding and routing decision. Because of the high optics transmission frequency the burst or packet scheduler faces a difficult challenge, consequentially, high performance and time focused design of both memory and forwarding logic is need. This open issue has been faced in this thesis proposing an high efficiently implementation of burst and packet scheduler. The main novelty of the proposed implementation is that the scheduling problem has turned into simple calculation of a min/max function and the function complexity is almost independent of on the traffic conditions.
Resumo:
In this thesis we study three combinatorial optimization problems belonging to the classes of Network Design and Vehicle Routing problems that are strongly linked in the context of the design and management of transportation networks: the Non-Bifurcated Capacitated Network Design Problem (NBP), the Period Vehicle Routing Problem (PVRP) and the Pickup and Delivery Problem with Time Windows (PDPTW). These problems are NP-hard and contain as special cases some well known difficult problems such as the Traveling Salesman Problem and the Steiner Tree Problem. Moreover, they model the core structure of many practical problems arising in logistics and telecommunications. The NBP is the problem of designing the optimum network to satisfy a given set of traffic demands. Given a set of nodes, a set of potential links and a set of point-to-point demands called commodities, the objective is to select the links to install and dimension their capacities so that all the demands can be routed between their respective endpoints, and the sum of link fixed costs and commodity routing costs is minimized. The problem is called non- bifurcated because the solution network must allow each demand to follow a single path, i.e., the flow of each demand cannot be splitted. Although this is the case in many real applications, the NBP has received significantly less attention in the literature than other capacitated network design problems that allow bifurcation. We describe an exact algorithm for the NBP that is based on solving by an integer programming solver a formulation of the problem strengthened by simple valid inequalities and four new heuristic algorithms. One of these heuristics is an adaptive memory metaheuristic, based on partial enumeration, that could be applied to a wider class of structured combinatorial optimization problems. In the PVRP a fleet of vehicles of identical capacity must be used to service a set of customers over a planning period of several days. Each customer specifies a service frequency, a set of allowable day-combinations and a quantity of product that the customer must receive every time he is visited. For example, a customer may require to be visited twice during a 5-day period imposing that these visits take place on Monday-Thursday or Monday-Friday or Tuesday-Friday. The problem consists in simultaneously assigning a day- combination to each customer and in designing the vehicle routes for each day so that each customer is visited the required number of times, the number of routes on each day does not exceed the number of vehicles available, and the total cost of the routes over the period is minimized. We also consider a tactical variant of this problem, called Tactical Planning Vehicle Routing Problem, where customers require to be visited on a specific day of the period but a penalty cost, called service cost, can be paid to postpone the visit to a later day than that required. At our knowledge all the algorithms proposed in the literature for the PVRP are heuristics. In this thesis we present for the first time an exact algorithm for the PVRP that is based on different relaxations of a set partitioning-like formulation. The effectiveness of the proposed algorithm is tested on a set of instances from the literature and on a new set of instances. Finally, the PDPTW is to service a set of transportation requests using a fleet of identical vehicles of limited capacity located at a central depot. Each request specifies a pickup location and a delivery location and requires that a given quantity of load is transported from the pickup location to the delivery location. Moreover, each location can be visited only within an associated time window. Each vehicle can perform at most one route and the problem is to satisfy all the requests using the available vehicles so that each request is serviced by a single vehicle, the load on each vehicle does not exceed the capacity, and all locations are visited according to their time window. We formulate the PDPTW as a set partitioning-like problem with additional cuts and we propose an exact algorithm based on different relaxations of the mathematical formulation and a branch-and-cut-and-price algorithm. The new algorithm is tested on two classes of problems from the literature and compared with a recent branch-and-cut-and-price algorithm from the literature.
Resumo:
Modern embedded systems embrace many-core shared-memory designs. Due to constrained power and area budgets, most of them feature software-managed scratchpad memories instead of data caches to increase the data locality. It is therefore programmers’ responsibility to explicitly manage the memory transfers, and this make programming these platform cumbersome. Moreover, complex modern applications must be adequately parallelized before they can the parallel potential of the platform into actual performance. To support this, programming languages were proposed, which work at a high level of abstraction, and rely on a runtime whose cost hinders performance, especially in embedded systems, where resources and power budget are constrained. This dissertation explores the applicability of the shared-memory paradigm on modern many-core systems, focusing on the ease-of-programming. It focuses on OpenMP, the de-facto standard for shared memory programming. In a first part, the cost of algorithms for synchronization and data partitioning are analyzed, and they are adapted to modern embedded many-cores. Then, the original design of an OpenMP runtime library is presented, which supports complex forms of parallelism such as multi-level and irregular parallelism. In the second part of the thesis, the focus is on heterogeneous systems, where hardware accelerators are coupled to (many-)cores to implement key functional kernels with orders-of-magnitude of speedup and energy efficiency compared to the “pure software” version. However, three main issues rise, namely i) platform design complexity, ii) architectural scalability and iii) programmability. To tackle them, a template for a generic hardware processing unit (HWPU) is proposed, which share the memory banks with cores, and the template for a scalable architecture is shown, which integrates them through the shared-memory system. Then, a full software stack and toolchain are developed to support platform design and to let programmers exploiting the accelerators of the platform. The OpenMP frontend is extended to interact with it.
Resumo:
Shape memory materials (SMMs) represent an important class of smart materials that have the ability to return from a deformed state to their original shape. Thanks to such a property, SMMs are utilized in a wide range of innovative applications. The increasing number of applications and the consequent involvement of industrial players in the field have motivated researchers to formulate constitutive models able to catch the complex behavior of these materials and to develop robust computational tools for design purposes. Such a research field is still under progress, especially in the prediction of shape memory polymer (SMP) behavior and of important effects characterizing shape memory alloy (SMA) applications. Moreover, the frequent use of shape memory and metallic materials in biomedical devices, particularly in cardiovascular stents, implanted in the human body and experiencing millions of in-vivo cycles by the blood pressure, clearly indicates the need for a deeper understanding of fatigue/fracture failure in microsize components. The development of reliable stent designs against fatigue is still an open subject in scientific literature. Motivated by the described framework, the thesis focuses on several research issues involving the advanced constitutive, numerical and fatigue modeling of elastoplastic and shape memory materials. Starting from the constitutive modeling, the thesis proposes to develop refined phenomenological models for reliable SMA and SMP behavior descriptions. Then, concerning the numerical modeling, the thesis proposes to implement the models into numerical software by developing implicit/explicit time-integration algorithms, to guarantee robust computational tools for practical purposes. The described modeling activities are completed by experimental investigations on SMA actuator springs and polyethylene polymers. Finally, regarding the fatigue modeling, the thesis proposes the introduction of a general computational approach for the fatigue-life assessment of a classical stent design, in order to exploit computer-based simulations to prevent failures and modify design, without testing numerous devices.
Resumo:
Network monitoring is of paramount importance for effective network management: it allows to constantly observe the network’s behavior to ensure it is working as intended and can trigger both automated and manual remediation procedures in case of failures and anomalies. The concept of SDN decouples the control logic from legacy network infrastructure to perform centralized control on multiple switches in the network, and in this context, the responsibility of switches is only to forward packets according to the flow control instructions provided by controller. However, as current SDN switches only expose simple per-port and per-flow counters, the controller has to do almost all the processing to determine the network state, which causes significant communication overhead and excessive latency for monitoring purposes. The absence of programmability in the data plane of SDN prompted the advent of programmable switches, which allow developers to customize the data-plane pipeline and implement novel programs operating directly in the switches. This means that we can offload certain monitoring tasks to programmable data planes, to perform fine-grained monitoring even at very high packet processing speeds. Given the central importance of network monitoring exploiting programmable data planes, the goal of this thesis is to enable a wide range of monitoring tasks in programmable switches, with a specific focus on the ones equipped with programmable ASICs. Indeed, most network monitoring solutions available in literature do not take computational and memory constraints of programmable switches into due account, preventing, de facto, their successful implementation in commodity switches. This claims that network monitoring tasks can be executed in programmable switches. Our evaluations show that the contributions in this thesis could be used by network administrators as well as network security engineers, to better understand the network status depending on different monitoring metrics, and thus prevent network infrastructure and service outages.
Resumo:
The first topic analyzed in the thesis will be Neural Architecture Search (NAS). I will focus on two different tools that I developed, one to optimize the architecture of Temporal Convolutional Networks (TCNs), a convolutional model for time-series processing that has recently emerged, and one to optimize the data precision of tensors inside CNNs. The first NAS proposed explicitly targets the optimization of the most peculiar architectural parameters of TCNs, namely dilation, receptive field, and the number of features in each layer. Note that this is the first NAS that explicitly targets these networks. The second NAS proposed instead focuses on finding the most efficient data format for a target CNN, with the granularity of the layer filter. Note that applying these two NASes in sequence allows an "application designer" to minimize the structure of the neural network employed, minimizing the number of operations or the memory usage of the network. After that, the second topic described is the optimization of neural network deployment on edge devices. Importantly, exploiting edge platforms' scarce resources is critical for NN efficient execution on MCUs. To do so, I will introduce DORY (Deployment Oriented to memoRY) -- an automatic tool to deploy CNNs on low-cost MCUs. DORY, in different steps, can manage different levels of memory inside the MCU automatically, offload the computation workload (i.e., the different layers of a neural network) to dedicated hardware accelerators, and automatically generates ANSI C code that orchestrates off- and on-chip transfers with the computation phases. On top of this, I will introduce two optimized computation libraries that DORY can exploit to deploy TCNs and Transformers on edge efficiently. I conclude the thesis with two different applications on bio-signal analysis, i.e., heart rate tracking and sEMG-based gesture recognition.