46 resultados para Parallel or distributed processing
em Doria (National Library of Finland DSpace Services) - National Library of Finland, Finland
Resumo:
The past few decades have seen a considerable increase in the number of parallel and distributed systems. With the development of more complex applications, the need for more powerful systems has emerged and various parallel and distributed environments have been designed and implemented. Each of the environments, including hardware and software, has unique strengths and weaknesses. There is no single parallel environment that can be identified as the best environment for all applications with respect to hardware and software properties. The main goal of this thesis is to provide a novel way of performing data-parallel computation in parallel and distributed environments by utilizing the best characteristics of difference aspects of parallel computing. For the purpose of this thesis, three aspects of parallel computing were identified and studied. First, three parallel environments (shared memory, distributed memory, and a network of workstations) are evaluated to quantify theirsuitability for different parallel applications. Due to the parallel and distributed nature of the environments, networks connecting the processors in these environments were investigated with respect to their performance characteristics. Second, scheduling algorithms are studied in order to make them more efficient and effective. A concept of application-specific information scheduling is introduced. The application- specific information is data about the workload extractedfrom an application, which is provided to a scheduling algorithm. Three scheduling algorithms are enhanced to utilize the application-specific information to further refine their scheduling properties. A more accurate description of the workload is especially important in cases where the workunits are heterogeneous and the parallel environment is heterogeneous and/or non-dedicated. The results obtained show that the additional information regarding the workload has a positive impact on the performance of applications. Third, a programming paradigm for networks of symmetric multiprocessor (SMP) workstations is introduced. The MPIT programming paradigm incorporates the Message Passing Interface (MPI) with threads to provide a methodology to write parallel applications that efficiently utilize the available resources and minimize the overhead. The MPIT allows for communication and computation to overlap by deploying a dedicated thread for communication. Furthermore, the programming paradigm implements an application-specific scheduling algorithm. The scheduling algorithm is executed by the communication thread. Thus, the scheduling does not affect the execution of the parallel application. Performance results achieved from the MPIT show that considerable improvements over conventional MPI applications are achieved.
Resumo:
Simulation has traditionally been used for analyzing the behavior of complex real world problems. Even though only some features of the problems are considered, simulation time tends to become quite high even for common simulation problems. Parallel and distributed simulation is a viable technique for accelerating the simulations. The success of parallel simulation depends heavily on the combination of the simulation application, algorithm and message population in the simulation is sufficient, no additional delay is caused by this environment. In this thesis a conservative, parallel simulation algorithm is applied to the simulation of a cellular network application in a distributed workstation environment. This thesis presents a distributed simulation environment, Diworse, which is based on the use of networked workstations. The distributed environment is considered especially hard for conservative simulation algorithms due to the high cost of communication. In this thesis, however, the distributed environment is shown to be a viable alternative if the amount of communication is kept reasonable. Novel ideas of multiple message simulation and channel reduction enable efficient use of this environment for the simulation of a cellular network application. The distribution of the simulation is based on a modification of the well known Chandy-Misra deadlock avoidance algorithm with null messages. The basic Chandy Misra algorithm is modified by using the null message cancellation and multiple message simulation techniques. The modifications reduce the amount of null messages and the time required for their execution, thus reducing the simulation time required. The null message cancellation technique reduces the processing time of null messages as the arriving null message cancels other non processed null messages. The multiple message simulation forms groups of messages as it simulates several messages before it releases the new created messages. If the message population in the simulation is suffiecient, no additional delay is caused by this operation A new technique for considering the simulation application is also presented. The performance is improved by establishing a neighborhood for the simulation elements. The neighborhood concept is based on a channel reduction technique, where the properties of the application exclusively determine which connections are necessary when a certain accuracy for simulation results is required. Distributed simulation is also analyzed in order to find out the effect of the different elements in the implemented simulation environment. This analysis is performed by using critical path analysis. Critical path analysis allows determination of a lower bound for the simulation time. In this thesis critical times are computed for sequential and parallel traces. The analysis based on sequential traces reveals the parallel properties of the application whereas the analysis based on parallel traces reveals the properties of the environment and the distribution.
Resumo:
Memristive computing refers to the utilization of the memristor, the fourth fundamental passive circuit element, in computational tasks. The existence of the memristor was theoretically predicted in 1971 by Leon O. Chua, but experimentally validated only in 2008 by HP Labs. A memristor is essentially a nonvolatile nanoscale programmable resistor — indeed, memory resistor — whose resistance, or memristance to be precise, is changed by applying a voltage across, or current through, the device. Memristive computing is a new area of research, and many of its fundamental questions still remain open. For example, it is yet unclear which applications would benefit the most from the inherent nonlinear dynamics of memristors. In any case, these dynamics should be exploited to allow memristors to perform computation in a natural way instead of attempting to emulate existing technologies such as CMOS logic. Examples of such methods of computation presented in this thesis are memristive stateful logic operations, memristive multiplication based on the translinear principle, and the exploitation of nonlinear dynamics to construct chaotic memristive circuits. This thesis considers memristive computing at various levels of abstraction. The first part of the thesis analyses the physical properties and the current-voltage behaviour of a single device. The middle part presents memristor programming methods, and describes microcircuits for logic and analog operations. The final chapters discuss memristive computing in largescale applications. In particular, cellular neural networks, and associative memory architectures are proposed as applications that significantly benefit from memristive implementation. The work presents several new results on memristor modeling and programming, memristive logic, analog arithmetic operations on memristors, and applications of memristors. The main conclusion of this thesis is that memristive computing will be advantageous in large-scale, highly parallel mixed-mode processing architectures. This can be justified by the following two arguments. First, since processing can be performed directly within memristive memory architectures, the required circuitry, processing time, and possibly also power consumption can be reduced compared to a conventional CMOS implementation. Second, intrachip communication can be naturally implemented by a memristive crossbar structure.
Resumo:
Diplomityön tarkoituksena on optimoida asiakkaiden sähkölaskun laskeminen hajautetun laskennan avulla. Älykkäiden etäluettavien energiamittareiden tullessa jokaiseen kotitalouteen, energiayhtiöt velvoitetaan laskemaan asiakkaiden sähkölaskut tuntiperusteiseen mittaustietoon perustuen. Kasvava tiedonmäärä lisää myös tarvittavien laskutehtävien määrää. Työssä arvioidaan vaihtoehtoja hajautetun laskennan toteuttamiseksi ja luodaan tarkempi katsaus pilvilaskennan mahdollisuuksiin. Lisäksi ajettiin simulaatioita, joiden avulla arvioitiin rinnakkaislaskennan ja peräkkäislaskennan eroja. Sähkölaskujen oikeinlaskemisen tueksi kehitettiin mittauspuu-algoritmi.
Resumo:
Due to various advantages such as flexibility, scalability and updatability, software intensive systems are increasingly embedded in everyday life. The constantly growing number of functions executed by these systems requires a high level of performance from the underlying platform. The main approach to incrementing performance has been the increase of operating frequency of a chip. However, this has led to the problem of power dissipation, which has shifted the focus of research to parallel and distributed computing. Parallel many-core platforms can provide the required level of computational power along with low power consumption. On the one hand, this enables parallel execution of highly intensive applications. With their computational power, these platforms are likely to be used in various application domains: from home use electronics (e.g., video processing) to complex critical control systems. On the other hand, the utilization of the resources has to be efficient in terms of performance and power consumption. However, the high level of on-chip integration results in the increase of the probability of various faults and creation of hotspots leading to thermal problems. Additionally, radiation, which is frequent in space but becomes an issue also at the ground level, can cause transient faults. This can eventually induce a faulty execution of applications. Therefore, it is crucial to develop methods that enable efficient as well as resilient execution of applications. The main objective of the thesis is to propose an approach to design agentbased systems for many-core platforms in a rigorous manner. When designing such a system, we explore and integrate various dynamic reconfiguration mechanisms into agents functionality. The use of these mechanisms enhances resilience of the underlying platform whilst maintaining performance at an acceptable level. The design of the system proceeds according to a formal refinement approach which allows us to ensure correct behaviour of the system with respect to postulated properties. To enable analysis of the proposed system in terms of area overhead as well as performance, we explore an approach, where the developed rigorous models are transformed into a high-level implementation language. Specifically, we investigate methods for deriving fault-free implementations from these models into, e.g., a hardware description language, namely VHDL.
Resumo:
Alikehittynyt infrastruktuuri, tiukat säädökset ja säädösten tulkitseminen, sekä monimutkaiset verotuskäytännöt ovat aiheuttaneet ongelmia suomalaisille Alikehittynyt infrastruktuuri, tiukat säädökset ja säädösten tulkitseminen, sekä monimutkaiset verotuskäytännöt ovat aiheuttaneet ongelmia suomalaisille yrityksille Kiinassa. Tutkimuksen perusteella yritykset eivät pysty vaikuttamaan infrastruktuurin kehittymiseen tai säädösten implementointiin, mutta ylläpitämällä suhteita ja valitsemalla oikeat partnerit yritykset pystyvät hallitsemaan ongelma-alueitaan. Etenkin ulkomaalaisille yrityksille oikean logistiikkaoperaattorin valinta on tärkeätä ja huomioon ottaen palvelutason, kulttuuritaustan sekä kansainväliset operaatiot on ulkomaalaisten yritysten tehokkaampaa käyttää kansainvälisiä operaattoreita kuin paikallisia toimijoita, jotkaovat usein halvempia, mutta eivät pysty toimimaan kansainvälisellä tasolla. Vientiin keskittyneiden yritysten tulisi sijoittua vapaakauppa-alueille tai vientiin painottuneille teollisuusalueille. Kyseisillä alueilla liiketoiminta mannermaahan on rajoitettu, eivätkä alueet täten sovellu yrityksille, jotka ovat keskittyneet Kiinan markkinoille. Paikallisesti operoivien yritysten tulisi sijoittua normaaleihin teollisuuspuistoihin ja käyttää tullin valvomia varastoja tukemaan kansainvälisiä toimintojaan.Tulisi myös muistaa etteivät kiinalaiset teollisuuspuistot täytä kansainvälisiä kriteerejä, joten säädöksiin on tärkeätä tutustua huolella jamielipiteitä kerätä toisilta yrityksiltä. Kiinassa merkittävimmät logistiikkaongelmat ilmenevät tuonnin ja viennin yhteydessä, jolloin säädökset ja toimintamallit ovat kontrolloidumpia. Etenkin tullaus- ja arvonlisävero ongelmat liittyvät kiinteästi tuonti- ja vientiprosessiin. Tutkimuksen tulokset osoittivat, että tullausprosessi tehostuu yhteistyön ja koulutuksen kautta, mutta arvonlisäverosta aiheutuvien kustannusten minimointi vaatii logistiikkapuistojen käyttöä. Mikäli asiakas haluaa tehdä tullauksen kotiprovinssissaan tai yritys tekee kauppaa ALV -vapautettujen yritysten kanssa, tulisi logistiikkapuistojen käyttöä lisätä. Käytettäessä logistiikkapuistoja yritykset välttävät tuotteiden kuljetukset Hongkongiin jatakaisin säästäen huomattavasti kustannuksissa ja toimitusajoissa. Logistiikkapuistoja on myös mahdollista käyttää ratkaisuna kasvaviin ja viivästyviin ALV palautuksiin. Tutkimuksen tulosten mukaan toimintaympäristö ja vientipainotteinen valmistus ohjaavat 3PL yritysten valintaa ja vaihtoehtoisten logistiikkapalvelujen implementointia. Etabloiduttaessavapaakauppa-alueille vientiin ja tuontiin liittyvät ongelmatekijät vahvistuvat sekä rajoitukset kiinan liiketoimintaan kasvavat, mikä tekee yhteistyönkansainvälisten logistiikkaoperaattoreiden kanssa välttämättömäksi ja kannustaa hyödyntämään logistiikkapuistoja.
Resumo:
Diplomityön tarkoituksena on selvittää eri laajakaistatekniikoiden ominaisuuksia, ja verrata niitä käyttäjien ja rakennuttajien näkökulmaan. Miten käyttäjätsuhtautuvat tekniikan kehittymiseen, ja mitä he ovat valmiita maksamaan siitä. Ovatko näkemykset rakentajien kanssa samansuuntaisia, vai onko tekniikka edellä käyttäjiä? Näihin kysymyksiin pyrittiin saamaan vastaus tässä diplomityössä haastattelemalla molempia osapuolia ja vertailemalla vastauksia. Laajakaistayhteydet ovat lähes kaikkien saatavilla nykypäivänä. Suurin osa yhteyksistä on toteutettu kuparitekniikoilla, joista ADSL on yleisin kaapeli-TV ratkaisun kanssa. Harvemmin asutuilla tai muuten hankalasti tavoitettavilla alueilla laajakaistayhteydet ovat toteutettu langattomin ratkaisuin WiMAX- tai @450-tekniikoilla. Laajakaistayhteyksien kriteerinä on ollut 256 kbit/s nopeus, mutta nykyään käyttäjien keskiarvo on noussut 2 Mbit/s nopeuteen. Nopeudet vaikuttavat sovelluksiin, mitä voidaan käyttää. Nykyään Internetin kautta on saatavilla monipuolisesti erilaisia sovelluksia ja viestintätapoja. Vaatimukset laajakaistayhteyksiltä ovat erilaisia; toiset vaativat reaaliaikaisuutta ja suurta nopeutta ja samalla toiset tyytyvät vähempään. Kaikille yhteyksille on kuitenkin yhteistä se, että niiden käyttövaatimukset ovat kasvaneet jatkuvasti. Tulevaisuutta onpyritty kartoittamaan tekniikoiden mahdollisesta kehitysnäkökulmasta, sekä sillä, miten muualla maailmassa edetään laajakaistatekniikoiden kanssa. Oman vivahteen kehitykseen tuovat kansalliset tarpeet ja resurssit.
Resumo:
Ionic liquids, ILs, have recently been studied with accelerating interest to be used for a deconstruction/fractionation, dissolution or pretreatment processing method of lignocellulosic biomass. ILs are usually utilized combined with heat. Regarding lignocellulosic recalcitrance toward fractionation and IL utilization, most of the studies concern IL utilization in the biomass fermentation process prior to the enzymatic hydrolysis step. It has been demonstrated that IL-pretreatment gives more efficient hydrolysis of the biomass polysaccharides than enzymatic hydrolysis alone. Both cellulose (especially cellulose) and lignin are very resistant towards fractionation and even dissolution methods. As an example, it can be mentioned that softwood, hardwood and grass-type plant species have different types of lignin structures leading to the fact that softwood lignin (guaiacyl lignin dominates) is the most difficult to solubilize or chemically disrupt. In addition to the known conventional biomass processing methods, several ILs have also been found to efficiently dissolve either cellulose and/or wood samples – different ILs are suitable for different purposes. An IL treatment of wood usually results in non-fibrous pulp, where lignin is not efficiently separated and wood components are selectively precipitated, as cellulose is not soluble or degradable in ionic liquids under mild conditions. Nevertheless, new ILs capable of rather good fractionation performance have recently emerged. The capability of the IL to dissolve or deconstruct wood or cellulose depends on several factors, (e.g. sample origin, the particle size of the biomass, mechanical treatments as pulverization, initial biomassto-IL ratio, water content of the biomass, possible impurities of IL, reaction conditions, temperature etc). The aim of this study was to obtain (fermentable) saccharides and other valuable chemicals from wood by a combined heat and IL-treatment. Thermal treatments alone contribute to the degradation of polysaccharides (e.g. 150 °C alone is said to cause the degradation of polysaccharides), thus temperatures below that should be used, if the research interest lies on the IL effectiveness. On the other hand, the efficiency of the IL-treatment can also be enhanced to combine other treatment methods, (e.g. microwave heating). The samples of spruce, pine and birch sawdust were treated with either 1-Ethyl-3-methylimidazolium chloride, Emim Cl, or 1-Ethyl-3-methylimidazolium acetate, Emim Ac, (or with ionized water for comparison) at various temperatures (where focus was between 80 and 120 °C). The samples were withdrawn at fixed time intervals (the main interest treatment time area lied between 0 and 100 hours). Double experiments were executed. The selected mono- and disaccharides, as well as their known degradation products, 5-hydroxymethylfurfural, 5-HMF, and furfural were analyzed with capillary electrophoresis, CE, and high-performance liquid chromatography, HPLC. Initially, even GC and GC-MS were utilized. Galactose, glucose, mannose and xylose were the main monosaccharides that were present in the wood samples exposed to ILs at elevated temperatures; in addition, furfural and 5-HMF were detected; moreover, the quantitative amount of the two latter ones were naturally increasing in line with the heating time or the IL:wood ratio.
Resumo:
Selostus: Prosessoinnin vaikutus vehnän sivutuotteita sisältävien rehuseosten aminohappojen ohutsuolisulavuuteen sioilla
Resumo:
This thesis gives an overview of the use of the level set methods in the field of image science. The similar fast marching method is discussed for comparison, also the narrow band and the particle level set methods are introduced. The level set method is a numerical scheme for representing, deforming and recovering structures in an arbitrary dimensions. It approximates and tracks the moving interfaces, dynamic curves and surfaces. The level set method does not define how and why some boundary is advancing the way it is but simply represents and tracks the boundary. The principal idea of the level set method is to represent the N dimensional boundary in the N+l dimensions. This gives the generality to represent even the complex boundaries. The level set methods can be powerful tools to represent dynamic boundaries, but they can require lot of computing power. Specially the basic level set method have considerable computational burden. This burden can be alleviated with more sophisticated versions of the level set algorithm like the narrow band level set method or with the programmable hardware implementation. Also the parallel approach can be used in suitable applications. It is concluded that these methods can be used in a quite broad range of image applications, like computer vision and graphics, scientific visualization and also to solve problems in computational physics. Level set methods and methods derived and inspired by it will be in the front line of image processing also in the future.
Resumo:
Video transcoding refers to the process of converting a digital video from one format into another format. It is a compute-intensive operation. Therefore, transcoding of a large number of simultaneous video streams requires a large amount of computing resources. Moreover, to handle di erent load conditions in a cost-e cient manner, the video transcoding service should be dynamically scalable. Infrastructure as a Service Clouds currently offer computing resources, such as virtual machines, under the pay-per-use business model. Thus the IaaS Clouds can be leveraged to provide a coste cient, dynamically scalable video transcoding service. To use computing resources e ciently in a cloud computing environment, cost-e cient virtual machine provisioning is required to avoid overutilization and under-utilization of virtual machines. This thesis presents proactive virtual machine resource allocation and de-allocation algorithms for video transcoding in cloud computing. Since users' requests for videos may change at di erent times, a check is required to see if the current computing resources are adequate for the video requests. Therefore, the work on admission control is also provided. In addition to admission control, temporal resolution reduction is used to avoid jitters in a video. Furthermore, in a cloud computing environment such as Amazon EC2, the computing resources are more expensive as compared with the storage resources. Therefore, to avoid repetition of transcoding operations, a transcoded video needs to be stored for a certain time. To store all videos for the same amount of time is also not cost-e cient because popular transcoded videos have high access rate while unpopular transcoded videos are rarely accessed. This thesis provides a cost-e cient computation and storage trade-o strategy, which stores videos in the video repository as long as it is cost-e cient to store them. This thesis also proposes video segmentation strategies for bit rate reduction and spatial resolution reduction video transcoding. The evaluation of proposed strategies is performed using a message passing interface based video transcoder, which uses a coarse-grain parallel processing approach where video is segmented at group of pictures level.
Resumo:
Poster at Open Repositories 2014, Helsinki, Finland, June 9-13, 2014
Resumo:
Presentation at Open Repositories 2014, Helsinki, Finland, June 9-13, 2014
Resumo:
Numerical weather prediction and climate simulation have been among the computationally most demanding applications of high performance computing eversince they were started in the 1950's. Since the 1980's, the most powerful computers have featured an ever larger number of processors. By the early 2000's, this number is often several thousand. An operational weather model must use all these processors in a highly coordinated fashion. The critical resource in running such models is not computation, but the amount of necessary communication between the processors. The communication capacity of parallel computers often fallsfar short of their computational power. The articles in this thesis cover fourteen years of research into how to harness thousands of processors on a single weather forecast or climate simulation, so that the application can benefit as much as possible from the power of parallel high performance computers. The resultsattained in these articles have already been widely applied, so that currently most of the organizations that carry out global weather forecasting or climate simulation anywhere in the world use methods introduced in them. Some further studies extend parallelization opportunities into other parts of the weather forecasting environment, in particular to data assimilation of satellite observations.
Resumo:
In order that the radius and thus ununiform structure of the teeth and otherelectrical and magnetic parts of the machine may be taken into consideration the calculation of an axial flux permanent magnet machine is, conventionally, doneby means of 3D FEM-methods. This calculation procedure, however, requires a lotof time and computer recourses. This study proves that also analytical methods can be applied to perform the calculation successfully. The procedure of the analytical calculation can be summarized into following steps: first the magnet is divided into slices, which makes the calculation for each section individually, and then the parts are submitted to calculation of the final results. It is obvious that using this method can save a lot of designing and calculating time. Thecalculation program is designed to model the magnetic and electrical circuits of surface mounted axial flux permanent magnet synchronous machines in such a way, that it takes into account possible magnetic saturation of the iron parts. Theresult of the calculation is the torque of the motor including the vibrations. The motor geometry and the materials and either the torque or pole angle are defined and the motor can be fed with an arbitrary shape and amplitude of three-phase currents. There are no limits for the size and number of the pole pairs nor for many other factors. The calculation steps and the number of different sections of the magnet are selectable, but the calculation time is strongly depending on this. The results are compared to the measurements of real prototypes. The permanent magnet creates part of the flux in the magnetic circuit. The form and amplitude of the flux density in the air-gap depends on the geometry and material of the magnetic circuit, on the length of the air-gap and remanence flux density of the magnet. Slotting is taken into account by using the Carter factor in the slot opening area. The calculation is simple and fast if the shape of the magnetis a square and has no skew in relation to the stator slots. With a more complicated magnet shape the calculation has to be done in several sections. It is clear that according to the increasing number of sections also the result will become more accurate. In a radial flux motor all sections of the magnets create force with a same radius. In the case of an axial flux motor, each radial section creates force with a different radius and the torque is the sum of these. The magnetic circuit of the motor, consisting of the stator iron, rotor iron, air-gap, magnet and the slot, is modelled with a reluctance net, which considers the saturation of the iron. This means, that several iterations, in which the permeability is updated, has to be done in order to get final results. The motor torque is calculated using the instantaneous linkage flux and stator currents. Flux linkage is called the part of the flux that is created by the permanent magnets and the stator currents passing through the coils in stator teeth. The angle between this flux and the phase currents define the torque created by the magnetic circuit. Due to the winding structure of the stator and in order to limit the leakage flux the slot openings of the stator are normally not made of ferromagnetic material even though, in some cases, semimagnetic slot wedges are used. In the slot opening faces the flux enters the iron almost normally (tangentially with respect to the rotor flux) creating tangential forces in the rotor. This phenomenon iscalled cogging. The flux in the slot opening area on the different sides of theopening and in the different slot openings is not equal and so these forces do not compensate each other. In the calculation it is assumed that the flux entering the left side of the opening is the component left from the geometrical centre of the slot. This torque component together with the torque component calculated using the Lorenz force make the total torque of the motor. It is easy to assume that when all the magnet edges, where the derivative component of the magnet flux density is at its highest, enter the slot openings at the same time, this will have as a result a considerable cogging torque. To reduce the cogging torquethe magnet edges can be shaped so that they are not parallel to the stator slots, which is the common way to solve the problem. In doing so, the edge may be spread along the whole slot pitch and thus also the high derivative component willbe spread to occur equally along the rotation. Besides forming the magnets theymay also be placed somewhat asymmetric on the rotor surface. The asymmetric distribution can be made in many different ways. All the magnets may have a different deflection of the symmetrical centre point or they can be for example shiftedin pairs. There are some factors that limit the deflection. The first is that the magnets cannot overlap. The magnet shape and the relative width compared to the pole define the deflection in this case. The other factor is that a shifting of the poles limits the maximum torque of the motor. If the edges of adjacent magnets are very close to each other the leakage flux from one pole to the other increases reducing thus the air-gap magnetization. The asymmetric model needs some assumptions and simplifications in order to limit the size of the model and calculation time. The reluctance net is made for symmetric distribution. If the magnets are distributed asymmetrically the flux in the different pole pairs will not be exactly the same. Therefore, the assumption that the flux flows from the edges of the model to the next pole pairs, in the calculation model from one edgeto the other, is not correct. If it were wished for that this fact should be considered in multi-pole pair machines, this would mean that all the poles, in other words the whole machine, should be modelled in reluctance net. The error resulting from this wrong assumption is, nevertheless, irrelevant.