16 resultados para parallel computation model
em Doria (National Library of Finland DSpace Services) - National Library of Finland, Finland
Resumo:
The past few decades have seen a considerable increase in the number of parallel and distributed systems. With the development of more complex applications, the need for more powerful systems has emerged and various parallel and distributed environments have been designed and implemented. Each of the environments, including hardware and software, has unique strengths and weaknesses. There is no single parallel environment that can be identified as the best environment for all applications with respect to hardware and software properties. The main goal of this thesis is to provide a novel way of performing data-parallel computation in parallel and distributed environments by utilizing the best characteristics of difference aspects of parallel computing. For the purpose of this thesis, three aspects of parallel computing were identified and studied. First, three parallel environments (shared memory, distributed memory, and a network of workstations) are evaluated to quantify theirsuitability for different parallel applications. Due to the parallel and distributed nature of the environments, networks connecting the processors in these environments were investigated with respect to their performance characteristics. Second, scheduling algorithms are studied in order to make them more efficient and effective. A concept of application-specific information scheduling is introduced. The application- specific information is data about the workload extractedfrom an application, which is provided to a scheduling algorithm. Three scheduling algorithms are enhanced to utilize the application-specific information to further refine their scheduling properties. A more accurate description of the workload is especially important in cases where the workunits are heterogeneous and the parallel environment is heterogeneous and/or non-dedicated. The results obtained show that the additional information regarding the workload has a positive impact on the performance of applications. Third, a programming paradigm for networks of symmetric multiprocessor (SMP) workstations is introduced. The MPIT programming paradigm incorporates the Message Passing Interface (MPI) with threads to provide a methodology to write parallel applications that efficiently utilize the available resources and minimize the overhead. The MPIT allows for communication and computation to overlap by deploying a dedicated thread for communication. Furthermore, the programming paradigm implements an application-specific scheduling algorithm. The scheduling algorithm is executed by the communication thread. Thus, the scheduling does not affect the execution of the parallel application. Performance results achieved from the MPIT show that considerable improvements over conventional MPI applications are achieved.
Resumo:
Diplomityön tarkoituksena on optimoida asiakkaiden sähkölaskun laskeminen hajautetun laskennan avulla. Älykkäiden etäluettavien energiamittareiden tullessa jokaiseen kotitalouteen, energiayhtiöt velvoitetaan laskemaan asiakkaiden sähkölaskut tuntiperusteiseen mittaustietoon perustuen. Kasvava tiedonmäärä lisää myös tarvittavien laskutehtävien määrää. Työssä arvioidaan vaihtoehtoja hajautetun laskennan toteuttamiseksi ja luodaan tarkempi katsaus pilvilaskennan mahdollisuuksiin. Lisäksi ajettiin simulaatioita, joiden avulla arvioitiin rinnakkaislaskennan ja peräkkäislaskennan eroja. Sähkölaskujen oikeinlaskemisen tueksi kehitettiin mittauspuu-algoritmi.
Resumo:
Cellular automata are models for massively parallel computation. A cellular automaton consists of cells which are arranged in some kind of regular lattice and a local update rule which updates the state of each cell according to the states of the cell's neighbors on each step of the computation. This work focuses on reversible one-dimensional cellular automata in which the cells are arranged in a two-way in_nite line and the computation is reversible, that is, the previous states of the cells can be derived from the current ones. In this work it is shown that several properties of reversible one-dimensional cellular automata are algorithmically undecidable, that is, there exists no algorithm that would tell whether a given cellular automaton has the property or not. It is shown that the tiling problem of Wang tiles remains undecidable even in some very restricted special cases. It follows that it is undecidable whether some given states will always appear in computations by the given cellular automaton. It also follows that a weaker form of expansivity, which is a concept of dynamical systems, is an undecidable property for reversible one-dimensional cellular automata. It is shown that several properties of dynamical systems are undecidable for reversible one-dimensional cellular automata. It shown that sensitivity to initial conditions and topological mixing are undecidable properties. Furthermore, non-sensitive and mixing cellular automata are recursively inseparable. It follows that also chaotic behavior is an undecidable property for reversible one-dimensional cellular automata.
Resumo:
Numerical weather prediction and climate simulation have been among the computationally most demanding applications of high performance computing eversince they were started in the 1950's. Since the 1980's, the most powerful computers have featured an ever larger number of processors. By the early 2000's, this number is often several thousand. An operational weather model must use all these processors in a highly coordinated fashion. The critical resource in running such models is not computation, but the amount of necessary communication between the processors. The communication capacity of parallel computers often fallsfar short of their computational power. The articles in this thesis cover fourteen years of research into how to harness thousands of processors on a single weather forecast or climate simulation, so that the application can benefit as much as possible from the power of parallel high performance computers. The resultsattained in these articles have already been widely applied, so that currently most of the organizations that carry out global weather forecasting or climate simulation anywhere in the world use methods introduced in them. Some further studies extend parallelization opportunities into other parts of the weather forecasting environment, in particular to data assimilation of satellite observations.
Resumo:
With the shift towards many-core computer architectures, dataflow programming has been proposed as one potential solution for producing software that scales to a varying number of processor cores. Programming for parallel architectures is considered difficult as the current popular programming languages are inherently sequential and introducing parallelism is typically up to the programmer. Dataflow, however, is inherently parallel, describing an application as a directed graph, where nodes represent calculations and edges represent a data dependency in form of a queue. These queues are the only allowed communication between the nodes, making the dependencies between the nodes explicit and thereby also the parallelism. Once a node have the su cient inputs available, the node can, independently of any other node, perform calculations, consume inputs, and produce outputs. Data ow models have existed for several decades and have become popular for describing signal processing applications as the graph representation is a very natural representation within this eld. Digital lters are typically described with boxes and arrows also in textbooks. Data ow is also becoming more interesting in other domains, and in principle, any application working on an information stream ts the dataflow paradigm. Such applications are, among others, network protocols, cryptography, and multimedia applications. As an example, the MPEG group standardized a dataflow language called RVC-CAL to be use within reconfigurable video coding. Describing a video coder as a data ow network instead of with conventional programming languages, makes the coder more readable as it describes how the video dataflows through the different coding tools. While dataflow provides an intuitive representation for many applications, it also introduces some new problems that need to be solved in order for data ow to be more widely used. The explicit parallelism of a dataflow program is descriptive and enables an improved utilization of available processing units, however, the independent nodes also implies that some kind of scheduling is required. The need for efficient scheduling becomes even more evident when the number of nodes is larger than the number of processing units and several nodes are running concurrently on one processor core. There exist several data ow models of computation, with different trade-offs between expressiveness and analyzability. These vary from rather restricted but statically schedulable, with minimal scheduling overhead, to dynamic where each ring requires a ring rule to evaluated. The model used in this work, namely RVC-CAL, is a very expressive language, and in the general case it requires dynamic scheduling, however, the strong encapsulation of dataflow nodes enables analysis and the scheduling overhead can be reduced by using quasi-static, or piecewise static, scheduling techniques. The scheduling problem is concerned with nding the few scheduling decisions that must be run-time, while most decisions are pre-calculated. The result is then an, as small as possible, set of static schedules that are dynamically scheduled. To identify these dynamic decisions and to find the concrete schedules, this thesis shows how quasi-static scheduling can be represented as a model checking problem. This involves identifying the relevant information to generate a minimal but complete model to be used for model checking. The model must describe everything that may affect scheduling of the application while omitting everything else in order to avoid state space explosion. This kind of simplification is necessary to make the state space analysis feasible. For the model checker to nd the actual schedules, a set of scheduling strategies are de ned which are able to produce quasi-static schedulers for a wide range of applications. The results of this work show that actor composition with quasi-static scheduling can be used to transform data ow programs to t many different computer architecture with different type and number of cores. This in turn, enables dataflow to provide a more platform independent representation as one application can be fitted to a specific processor architecture without changing the actual program representation. Instead, the program representation is in the context of design space exploration optimized by the development tools to fit the target platform. This work focuses on representing the dataflow scheduling problem as a model checking problem and is implemented as part of a compiler infrastructure. The thesis also presents experimental results as evidence of the usefulness of the approach.
Resumo:
In order that the radius and thus ununiform structure of the teeth and otherelectrical and magnetic parts of the machine may be taken into consideration the calculation of an axial flux permanent magnet machine is, conventionally, doneby means of 3D FEM-methods. This calculation procedure, however, requires a lotof time and computer recourses. This study proves that also analytical methods can be applied to perform the calculation successfully. The procedure of the analytical calculation can be summarized into following steps: first the magnet is divided into slices, which makes the calculation for each section individually, and then the parts are submitted to calculation of the final results. It is obvious that using this method can save a lot of designing and calculating time. Thecalculation program is designed to model the magnetic and electrical circuits of surface mounted axial flux permanent magnet synchronous machines in such a way, that it takes into account possible magnetic saturation of the iron parts. Theresult of the calculation is the torque of the motor including the vibrations. The motor geometry and the materials and either the torque or pole angle are defined and the motor can be fed with an arbitrary shape and amplitude of three-phase currents. There are no limits for the size and number of the pole pairs nor for many other factors. The calculation steps and the number of different sections of the magnet are selectable, but the calculation time is strongly depending on this. The results are compared to the measurements of real prototypes. The permanent magnet creates part of the flux in the magnetic circuit. The form and amplitude of the flux density in the air-gap depends on the geometry and material of the magnetic circuit, on the length of the air-gap and remanence flux density of the magnet. Slotting is taken into account by using the Carter factor in the slot opening area. The calculation is simple and fast if the shape of the magnetis a square and has no skew in relation to the stator slots. With a more complicated magnet shape the calculation has to be done in several sections. It is clear that according to the increasing number of sections also the result will become more accurate. In a radial flux motor all sections of the magnets create force with a same radius. In the case of an axial flux motor, each radial section creates force with a different radius and the torque is the sum of these. The magnetic circuit of the motor, consisting of the stator iron, rotor iron, air-gap, magnet and the slot, is modelled with a reluctance net, which considers the saturation of the iron. This means, that several iterations, in which the permeability is updated, has to be done in order to get final results. The motor torque is calculated using the instantaneous linkage flux and stator currents. Flux linkage is called the part of the flux that is created by the permanent magnets and the stator currents passing through the coils in stator teeth. The angle between this flux and the phase currents define the torque created by the magnetic circuit. Due to the winding structure of the stator and in order to limit the leakage flux the slot openings of the stator are normally not made of ferromagnetic material even though, in some cases, semimagnetic slot wedges are used. In the slot opening faces the flux enters the iron almost normally (tangentially with respect to the rotor flux) creating tangential forces in the rotor. This phenomenon iscalled cogging. The flux in the slot opening area on the different sides of theopening and in the different slot openings is not equal and so these forces do not compensate each other. In the calculation it is assumed that the flux entering the left side of the opening is the component left from the geometrical centre of the slot. This torque component together with the torque component calculated using the Lorenz force make the total torque of the motor. It is easy to assume that when all the magnet edges, where the derivative component of the magnet flux density is at its highest, enter the slot openings at the same time, this will have as a result a considerable cogging torque. To reduce the cogging torquethe magnet edges can be shaped so that they are not parallel to the stator slots, which is the common way to solve the problem. In doing so, the edge may be spread along the whole slot pitch and thus also the high derivative component willbe spread to occur equally along the rotation. Besides forming the magnets theymay also be placed somewhat asymmetric on the rotor surface. The asymmetric distribution can be made in many different ways. All the magnets may have a different deflection of the symmetrical centre point or they can be for example shiftedin pairs. There are some factors that limit the deflection. The first is that the magnets cannot overlap. The magnet shape and the relative width compared to the pole define the deflection in this case. The other factor is that a shifting of the poles limits the maximum torque of the motor. If the edges of adjacent magnets are very close to each other the leakage flux from one pole to the other increases reducing thus the air-gap magnetization. The asymmetric model needs some assumptions and simplifications in order to limit the size of the model and calculation time. The reluctance net is made for symmetric distribution. If the magnets are distributed asymmetrically the flux in the different pole pairs will not be exactly the same. Therefore, the assumption that the flux flows from the edges of the model to the next pole pairs, in the calculation model from one edgeto the other, is not correct. If it were wished for that this fact should be considered in multi-pole pair machines, this would mean that all the poles, in other words the whole machine, should be modelled in reluctance net. The error resulting from this wrong assumption is, nevertheless, irrelevant.
Resumo:
This thesis presents briefly the basic operation and use of centrifugal pumps and parallel pumping applications. The characteristics of parallel pumping applications are compared to circuitry, in order to search analogy between these technical fields. The purpose of studying circuitry is to find out if common software tools for solving circuit performance could be used to observe parallel pumping applications. The empirical part of the thesis introduces a simulation environment for parallel pumping systems, which is based on circuit components of Matlab Simulink —software. The created simulation environment ensures the observation of variable speed controlled parallel pumping systems in case of different controlling methods. The introduced simulation environment was evaluated by building a simulation model for actual parallel pumping system at Lappeenranta University of Technology. The simulated performance of the parallel pumps was compared to measured values of the actual system. The gathered information shows, that if the initial data of the system and pump perfonnance is adequate, the circuitry based simulation environment can be exploited to observe parallel pumping systems. The introduced simulation environment can represent the actual operation of parallel pumps in reasonably accuracy. There by the circuitry based simulation can be used as a researching tool to develop new controlling ways for parallel pumps.
Resumo:
Over the last decades, calibration techniques have been widely used to improve the accuracy of robots and machine tools since they only involve software modification instead of changing the design and manufacture of the hardware. Traditionally, there are four steps are required for a calibration, i.e. error modeling, measurement, parameter identification and compensation. The objective of this thesis is to propose a method for the kinematics analysis and error modeling of a newly developed hybrid redundant robot IWR (Intersector Welding Robot), which possesses ten degrees of freedom (DOF) where 6-DOF in parallel and additional 4-DOF in serial. In this article, the problem of kinematics modeling and error modeling of the proposed IWR robot are discussed. Based on the vector arithmetic method, the kinematics model and the sensitivity model of the end-effector subject to the structure parameters is derived and analyzed. The relations between the pose (position and orientation) accuracy and manufacturing tolerances, actuation errors, and connection errors are formulated. Computer simulation is performed to examine the validity and effectiveness of the proposed method.
Resumo:
Novel biomaterials are needed to fill the demand of tailored bone substitutes required by an ever‐expanding array of surgical procedures and techniques. Wood, a natural fiber composite, modified with heat treatment to alter its composition, may provide a novel approach to the further development of hierarchically structured biomaterials. The suitability of wood as a model biomaterial as well as the effects of heat treatment on the osteoconductivity of wood was studied by placing untreated and heat‐treated (at 220 C , 200 degrees and 140 degrees for 2 h) birch implants (size 4 x 7mm) into drill cavities in the distal femur of rabbits. The follow‐up period was 4, 8 and 20 weeks in all in vivo experiments. The flexural properties of wood as well as dimensional changes and hydroxyl apatite formation on the surface of wood (untreated, 140 degrees C and 200 degrees C heat‐treated wood) were tested using 3‐point bending and compression tests and immersion in simulated body fluid. The effect of premeasurement grinding and the effect of heat treatment on the surface roughness and contour of wood were tested with contact stylus and non‐contact profilometry. The effects of heat treatment of wood on its interactions with biological fluids was assessed using two different test media and real human blood in liquid penetration tests. The results of the in vivo experiments showed implanted wood to be well tolerated, with no implants rejected due to foreign body reactions. Heat treatment had significant effects on the biocompatibility of wood, allowing host bone to grow into tight contact with the implant, with occasional bone ingrowth into the channels of the wood implant. The results of the liquid immersion experiments showed hydroxyl apatite formation only in the most extensively heat‐treated wood specimens, which supported the results of the in vivo experiments. Parallel conclusions could be drawn based on the results of the liquid penetration test where human blood had the most favorable interaction with the most extensively heat‐treated wood of the compared materials (untreated, 140 degrees C and 200 degrees C heat‐treated wood). The increasing biocompatibility was inferred to result mainly from changes in the chemical composition of wood induced by the heat treatment, namely the altered arrangement and concentrations of functional chemical groups. However, the influence of microscopic changes in the cell walls, surface roughness and contour cannot be totally excluded. The heat treatment was hypothesized to produce a functional change in the liquid distribution within wood, which could have biological relevance. It was concluded that the highly evolved hierarchical anatomy of wood could yield information for the future development of bulk bone substitutes according to the ideology of bioinspiration. Furthermore, the results of the biomechanical tests established that heat treatment alters various biologically relevant mechanical properties of wood, thus expanding the possibilities of wood as a model material, which could include e.g. scaffold applications, bulk bone applications and serving as a tool for both mechanical testing and for further development of synthetic fiber reinforced composites.
Resumo:
To obtain the desirable accuracy of a robot, there are two techniques available. The first option would be to make the robot match the nominal mathematic model. In other words, the manufacturing and assembling tolerances of every part would be extremely tight so that all of the various parameters would match the “design” or “nominal” values as closely as possible. This method can satisfy most of the accuracy requirements, but the cost would increase dramatically as the accuracy requirement increases. Alternatively, a more cost-effective solution is to build a manipulator with relaxed manufacturing and assembling tolerances. By modifying the mathematical model in the controller, the actual errors of the robot can be compensated. This is the essence of robot calibration. Simply put, robot calibration is the process of defining an appropriate error model and then identifying the various parameter errors that make the error model match the robot as closely as possible. This work focuses on kinematic calibration of a 10 degree-of-freedom (DOF) redundant serial-parallel hybrid robot. The robot consists of a 4-DOF serial mechanism and a 6-DOF hexapod parallel manipulator. The redundant 4-DOF serial structure is used to enlarge workspace and the 6-DOF hexapod manipulator is used to provide high load capabilities and stiffness for the whole structure. The main objective of the study is to develop a suitable calibration method to improve the accuracy of the redundant serial-parallel hybrid robot. To this end, a Denavit–Hartenberg (DH) hybrid error model and a Product-of-Exponential (POE) error model are developed for error modeling of the proposed robot. Furthermore, two kinds of global optimization methods, i.e. the differential-evolution (DE) algorithm and the Markov Chain Monte Carlo (MCMC) algorithm, are employed to identify the parameter errors of the derived error model. A measurement method based on a 3-2-1 wire-based pose estimation system is proposed and implemented in a Solidworks environment to simulate the real experimental validations. Numerical simulations and Solidworks prototype-model validations are carried out on the hybrid robot to verify the effectiveness, accuracy and robustness of the calibration algorithms.
Resumo:
Video transcoding refers to the process of converting a digital video from one format into another format. It is a compute-intensive operation. Therefore, transcoding of a large number of simultaneous video streams requires a large amount of computing resources. Moreover, to handle di erent load conditions in a cost-e cient manner, the video transcoding service should be dynamically scalable. Infrastructure as a Service Clouds currently offer computing resources, such as virtual machines, under the pay-per-use business model. Thus the IaaS Clouds can be leveraged to provide a coste cient, dynamically scalable video transcoding service. To use computing resources e ciently in a cloud computing environment, cost-e cient virtual machine provisioning is required to avoid overutilization and under-utilization of virtual machines. This thesis presents proactive virtual machine resource allocation and de-allocation algorithms for video transcoding in cloud computing. Since users' requests for videos may change at di erent times, a check is required to see if the current computing resources are adequate for the video requests. Therefore, the work on admission control is also provided. In addition to admission control, temporal resolution reduction is used to avoid jitters in a video. Furthermore, in a cloud computing environment such as Amazon EC2, the computing resources are more expensive as compared with the storage resources. Therefore, to avoid repetition of transcoding operations, a transcoded video needs to be stored for a certain time. To store all videos for the same amount of time is also not cost-e cient because popular transcoded videos have high access rate while unpopular transcoded videos are rarely accessed. This thesis provides a cost-e cient computation and storage trade-o strategy, which stores videos in the video repository as long as it is cost-e cient to store them. This thesis also proposes video segmentation strategies for bit rate reduction and spatial resolution reduction video transcoding. The evaluation of proposed strategies is performed using a message passing interface based video transcoder, which uses a coarse-grain parallel processing approach where video is segmented at group of pictures level.
Resumo:
The pumping processes requiring wide range of flow are often equipped with parallelconnected centrifugal pumps. In parallel pumping systems, the use of variable speed control allows that the required output for the process can be delivered with a varying number of operated pump units and selected rotational speed references. However, the optimization of the parallel-connected rotational speed controlled pump units often requires adaptive modelling of both parallel pump characteristics and the surrounding system in varying operation conditions. The available information required for the system modelling in typical parallel pumping applications such as waste water treatment and various cooling and water delivery pumping tasks can be limited, and the lack of real-time operation point monitoring often sets limits for accurate energy efficiency optimization. Hence, alternatives for easily implementable control strategies which can be adopted with minimum system data are necessary. This doctoral thesis concentrates on the methods that allow the energy efficient use of variable speed controlled parallel pumps in system scenarios in which the parallel pump units consist of a centrifugal pump, an electric motor, and a frequency converter. Firstly, the suitable operation conditions for variable speed controlled parallel pumps are studied. Secondly, methods for determining the output of each parallel pump unit using characteristic curve-based operation point estimation with frequency converter are discussed. Thirdly, the implementation of the control strategy based on real-time pump operation point estimation and sub-optimization of each parallel pump unit is studied. The findings of the thesis support the idea that the energy efficiency of the pumping can be increased without the installation of new, more efficient components in the systems by simply adopting suitable control strategies. An easily implementable and adaptive control strategy for variable speed controlled parallel pumping systems can be created by utilizing the pump operation point estimation available in modern frequency converters. Hence, additional real-time flow metering, start-up measurements, and detailed system model are unnecessary, and the pumping task can be fulfilled by determining a speed reference for each parallel-pump unit which suggests the energy efficient operation of the pumping system.
Resumo:
Presentation at Open Repositories 2014, Helsinki, Finland, June 9-13, 2014
Resumo:
The dissertation proposes two control strategies, which include the trajectory planning and vibration suppression, for a kinematic redundant serial-parallel robot machine, with the aim of attaining the satisfactory machining performance. For a given prescribed trajectory of the robot's end-effector in the Cartesian space, a set of trajectories in the robot's joint space are generated based on the best stiffness performance of the robot along the prescribed trajectory. To construct the required system-wide analytical stiffness model for the serial-parallel robot machine, a variant of the virtual joint method (VJM) is proposed in the dissertation. The modified method is an evolution of Gosselin's lumped model that can account for the deformations of a flexible link in more directions. The effectiveness of this VJM variant is validated by comparing the computed stiffness results of a flexible link with the those of a matrix structural analysis (MSA) method. The comparison shows that the numerical results from both methods on an individual flexible beam are almost identical, which, in some sense, provides mutual validation. The most prominent advantage of the presented VJM variant compared with the MSA method is that it can be applied in a flexible structure system with complicated kinematics formed in terms of flexible serial links and joints. Moreover, by combining the VJM variant and the virtual work principle, a systemwide analytical stiffness model can be easily obtained for mechanisms with both serial kinematics and parallel kinematics. In the dissertation, a system-wide stiffness model of a kinematic redundant serial-parallel robot machine is constructed based on integration of the VJM variant and the virtual work principle. Numerical results of its stiffness performance are reported. For a kinematic redundant robot, to generate a set of feasible joints' trajectories for a prescribed trajectory of its end-effector, its system-wide stiffness performance is taken as the constraint in the joints trajectory planning in the dissertation. For a prescribed location of the end-effector, the robot permits an infinite number of inverse solutions, which consequently yields infinite kinds of stiffness performance. Therefore, a differential evolution (DE) algorithm in which the positions of redundant joints in the kinematics are taken as input variables was employed to search for the best stiffness performance of the robot. Numerical results of the generated joint trajectories are given for a kinematic redundant serial-parallel robot machine, IWR (Intersector Welding/Cutting Robot), when a particular trajectory of its end-effector has been prescribed. The numerical results show that the joint trajectories generated based on the stiffness optimization are feasible for realization in the control system since they are acceptably smooth. The results imply that the stiffness performance of the robot machine deviates smoothly with respect to the kinematic configuration in the adjacent domain of its best stiffness performance. To suppress the vibration of the robot machine due to varying cutting force during the machining process, this dissertation proposed a feedforward control strategy, which is constructed based on the derived inverse dynamics model of target system. The effectiveness of applying such a feedforward control in the vibration suppression has been validated in a parallel manipulator in the software environment. The experimental study of such a feedforward control has also been included in the dissertation. The difficulties of modelling the actual system due to the unknown components in its dynamics is noticed. As a solution, a back propagation (BP) neural network is proposed for identification of the unknown components of the dynamics model of the target system. To train such a BP neural network, a modified Levenberg-Marquardt algorithm that can utilize an experimental input-output data set of the entire dynamic system is introduced in the dissertation. Validation of the BP neural network and the modified Levenberg- Marquardt algorithm is done, respectively, by a sinusoidal output approximation, a second order system parameters estimation, and a friction model estimation of a parallel manipulator, which represent three different application aspects of this method.
Resumo:
This thesis presents a novel design paradigm, called Virtual Runtime Application Partitions (VRAP), to judiciously utilize the on-chip resources. As the dark silicon era approaches, where the power considerations will allow only a fraction chip to be powered on, judicious resource management will become a key consideration in future designs. Most of the works on resource management treat only the physical components (i.e. computation, communication, and memory blocks) as resources and manipulate the component to application mapping to optimize various parameters (e.g. energy efficiency). To further enhance the optimization potential, in addition to the physical resources we propose to manipulate abstract resources (i.e. voltage/frequency operating point, the fault-tolerance strength, the degree of parallelism, and the configuration architecture). The proposed framework (i.e. VRAP) encapsulates methods, algorithms, and hardware blocks to provide each application with the abstract resources tailored to its needs. To test the efficacy of this concept, we have developed three distinct self adaptive environments: (i) Private Operating Environment (POE), (ii) Private Reliability Environment (PRE), and (iii) Private Configuration Environment (PCE) that collectively ensure that each application meets its deadlines using minimal platform resources. In this work several novel architectural enhancements, algorithms and policies are presented to realize the virtual runtime application partitions efficiently. Considering the future design trends, we have chosen Coarse Grained Reconfigurable Architectures (CGRAs) and Network on Chips (NoCs) to test the feasibility of our approach. Specifically, we have chosen Dynamically Reconfigurable Resource Array (DRRA) and McNoC as the representative CGRA and NoC platforms. The proposed techniques are compared and evaluated using a variety of quantitative experiments. Synthesis and simulation results demonstrate VRAP significantly enhances the energy and power efficiency compared to state of the art.