49 resultados para Parallel computing. Multilayer perceptron. OpenMP

em Doria (National Library of Finland DSpace Services) - National Library of Finland, Finland


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Tietokonejärjestelmän osien ja ohjelmistojen suorituskykymittauksista saadaan tietoa,jota voidaan käyttää suorituskyvyn parantamiseen ja laitteistohankintojen päätöksen tukena. Tässä työssä tutustutaan suorituskyvyn mittaamiseen ja mittausohjelmiin eli ns. benchmark-ohjelmistoihin. Työssä etsittiin ja arvioitiin eri tyyppisiä vapaasti saatavilla olevia benchmark-ohjelmia, jotka soveltuvat Linux-laskentaklusterin suorituskyvynanalysointiin. Benchmarkit ryhmiteltiin ja arvioitiin testaamalla niiden ominaisuuksia Linux-klusterissa. Työssä käsitellään myös mittausten tekemisen ja rinnakkaislaskennan haasteita. Benchmarkkeja löytyi moneen tarkoitukseen ja ne osoittautuivat laadultaan ja laajuudeltaan vaihteleviksi. Niitä on myös koottu ohjelmistopaketeiksi, jotta laitteiston suorituskyvystä saisi laajemman kuvan kuin mitä yhdellä ohjelmalla on mahdollista saada. Olennaista on ymmärtää nopeus, jolla dataa saadaan siirretyä prosessorille keskusmuistista, levyjärjestelmistä ja toisista laskentasolmuista. Tyypillinen benchmark-ohjelma sisältää paljon laskentaa tarvitsevan matemaattisen algoritmin, jota käytetään tieteellisissä ohjelmistoissa. Benchmarkista riippuen tulosten ymmärtäminen ja hyödyntäminen voi olla haasteellista.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The past few decades have seen a considerable increase in the number of parallel and distributed systems. With the development of more complex applications, the need for more powerful systems has emerged and various parallel and distributed environments have been designed and implemented. Each of the environments, including hardware and software, has unique strengths and weaknesses. There is no single parallel environment that can be identified as the best environment for all applications with respect to hardware and software properties. The main goal of this thesis is to provide a novel way of performing data-parallel computation in parallel and distributed environments by utilizing the best characteristics of difference aspects of parallel computing. For the purpose of this thesis, three aspects of parallel computing were identified and studied. First, three parallel environments (shared memory, distributed memory, and a network of workstations) are evaluated to quantify theirsuitability for different parallel applications. Due to the parallel and distributed nature of the environments, networks connecting the processors in these environments were investigated with respect to their performance characteristics. Second, scheduling algorithms are studied in order to make them more efficient and effective. A concept of application-specific information scheduling is introduced. The application- specific information is data about the workload extractedfrom an application, which is provided to a scheduling algorithm. Three scheduling algorithms are enhanced to utilize the application-specific information to further refine their scheduling properties. A more accurate description of the workload is especially important in cases where the workunits are heterogeneous and the parallel environment is heterogeneous and/or non-dedicated. The results obtained show that the additional information regarding the workload has a positive impact on the performance of applications. Third, a programming paradigm for networks of symmetric multiprocessor (SMP) workstations is introduced. The MPIT programming paradigm incorporates the Message Passing Interface (MPI) with threads to provide a methodology to write parallel applications that efficiently utilize the available resources and minimize the overhead. The MPIT allows for communication and computation to overlap by deploying a dedicated thread for communication. Furthermore, the programming paradigm implements an application-specific scheduling algorithm. The scheduling algorithm is executed by the communication thread. Thus, the scheduling does not affect the execution of the parallel application. Performance results achieved from the MPIT show that considerable improvements over conventional MPI applications are achieved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Numerical weather prediction and climate simulation have been among the computationally most demanding applications of high performance computing eversince they were started in the 1950's. Since the 1980's, the most powerful computers have featured an ever larger number of processors. By the early 2000's, this number is often several thousand. An operational weather model must use all these processors in a highly coordinated fashion. The critical resource in running such models is not computation, but the amount of necessary communication between the processors. The communication capacity of parallel computers often fallsfar short of their computational power. The articles in this thesis cover fourteen years of research into how to harness thousands of processors on a single weather forecast or climate simulation, so that the application can benefit as much as possible from the power of parallel high performance computers. The resultsattained in these articles have already been widely applied, so that currently most of the organizations that carry out global weather forecasting or climate simulation anywhere in the world use methods introduced in them. Some further studies extend parallelization opportunities into other parts of the weather forecasting environment, in particular to data assimilation of satellite observations.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This master’s thesis aims to study and represent from literature how evolutionary algorithms are used to solve different search and optimisation problems in the area of software engineering. Evolutionary algorithms are methods, which imitate the natural evolution process. An artificial evolution process evaluates fitness of each individual, which are solution candidates. The next population of candidate solutions is formed by using the good properties of the current population by applying different mutation and crossover operations. Different kinds of evolutionary algorithm applications related to software engineering were searched in the literature. Applications were classified and represented. Also the necessary basics about evolutionary algorithms were presented. It was concluded, that majority of evolutionary algorithm applications related to software engineering were about software design or testing. For example, there were applications about classifying software production data, project scheduling, static task scheduling related to parallel computing, allocating modules to subsystems, N-version programming, test data generation and generating an integration test order. Many applications were experimental testing rather than ready for real production use. There were also some Computer Aided Software Engineering tools based on evolutionary algorithms.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Diplomityön tarkoituksena on optimoida asiakkaiden sähkölaskun laskeminen hajautetun laskennan avulla. Älykkäiden etäluettavien energiamittareiden tullessa jokaiseen kotitalouteen, energiayhtiöt velvoitetaan laskemaan asiakkaiden sähkölaskut tuntiperusteiseen mittaustietoon perustuen. Kasvava tiedonmäärä lisää myös tarvittavien laskutehtävien määrää. Työssä arvioidaan vaihtoehtoja hajautetun laskennan toteuttamiseksi ja luodaan tarkempi katsaus pilvilaskennan mahdollisuuksiin. Lisäksi ajettiin simulaatioita, joiden avulla arvioitiin rinnakkaislaskennan ja peräkkäislaskennan eroja. Sähkölaskujen oikeinlaskemisen tueksi kehitettiin mittauspuu-algoritmi.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Tehoelektoniikkalaitteella tarkoitetaan ohjaus- ja säätöjärjestelmää, jolla sähköä muokataan saatavilla olevasta muodosta haluttuun uuteen muotoon ja samalla hallitaan sähköisen tehon virtausta lähteestä käyttökohteeseen. Tämä siis eroaa signaalielektroniikasta, jossa sähköllä tyypillisesti siirretään tietoa hyödyntäen eri tiloja. Tehoelektroniikkalaitteita vertailtaessa katsotaan yleensä niiden luotettavuutta, kokoa, tehokkuutta, säätötarkkuutta ja tietysti hintaa. Tyypillisiä tehoelektroniikkalaitteita ovat taajuudenmuuttajat, UPS (Uninterruptible Power Supply) -laitteet, hitsauskoneet, induktiokuumentimet sekä erilaiset teholähteet. Perinteisesti näiden laitteiden ohjaus toteutetaan käyttäen mikroprosessoreja, ASIC- (Application Specific Integrated Circuit) tai IC (Intergrated Circuit) -piirejä sekä analogisia säätimiä. Tässä tutkimuksessa on analysoitu FPGA (Field Programmable Gate Array) -piirien soveltuvuutta tehoelektroniikan ohjaukseen. FPGA-piirien rakenne muodostuu erilaisista loogisista elementeistä ja niiden välisistä yhdysjohdoista.Loogiset elementit ovat porttipiirejä ja kiikkuja. Yhdysjohdot ja loogiset elementit ovat piirissä kiinteitä eikä koostumusta tai lukumäärää voi jälkikäteen muuttaa. Ohjelmoitavuus syntyy elementtien välisistä liitännöistä. Piirissä on lukuisia, jopa miljoonia kytkimiä, joiden asento voidaan asettaa. Siten piirin peruselementeistä voidaan muodostaa lukematon määrä erilaisia toiminnallisia kokonaisuuksia. FPGA-piirejä on pitkään käytetty kommunikointialan tuotteissa ja siksi niiden kehitys on viime vuosina ollut nopeaa. Samalla hinnat ovat pudonneet. Tästä johtuen FPGA-piiristä on tullut kiinnostava vaihtoehto myös tehoelektroniikkalaitteiden ohjaukseen. Väitöstyössä FPGA-piirien käytön soveltuvuutta on tutkittu käyttäen kahta vaativaa ja erilaista käytännön tehoelektroniikkalaitetta: taajuudenmuuttajaa ja hitsauskonetta. Molempiin testikohteisiin rakennettiin alan suomalaisten teollisuusyritysten kanssa soveltuvat prototyypit,joiden ohjauselektroniikka muutettiin FPGA-pohjaiseksi. Lisäksi kehitettiin tätä uutta tekniikkaa hyödyntävät uudentyyppiset ohjausmenetelmät. Prototyyppien toimivuutta verrattiin vastaaviin perinteisillä menetelmillä ohjattuihin kaupallisiin tuotteisiin ja havaittiin FPGA-piirien mahdollistaman rinnakkaisen laskennantuomat edut molempien tehoelektroniikkalaitteiden toimivuudessa. Työssä on myösesitetty uusia menetelmiä ja työkaluja FPGA-pohjaisen säätöjärjestelmän kehitykseen ja testaukseen. Esitetyillä menetelmillä tuotteiden kehitys saadaan mahdollisimman nopeaksi ja tehokkaaksi. Lisäksi työssä on kehitetty FPGA:n sisäinen ohjaus- ja kommunikointiväylärakenne, joka palvelee tehoelektroniikkalaitteiden ohjaussovelluksia. Uusi kommunikointirakenne edistää lisäksi jo tehtyjen osajärjestelmien uudelleen käytettävyyttä tulevissa sovelluksissa ja tuotesukupolvissa.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Industry's growing need for higher productivity is placing new demands on mechanisms connected with electrical motors, because these can easily lead to vibration problems due to fast dynamics. Furthermore, the nonlinear effects caused by a motor frequently reduce servo stability, which diminishes the controller's ability to predict and maintain speed. Hence, the flexibility of a mechanism and its control has become an important area of research. The basic approach in control system engineering is to assume that the mechanism connected to a motor is rigid, so that vibrations in the tool mechanism, reel, gripper or any apparatus connected to the motor are not taken into account. This might reduce the ability of the machine system to carry out its assignment and shorten the lifetime of the equipment. Nonetheless, it is usually more important to know how the mechanism, or in other words the load on the motor, behaves. A nonlinear load control method for a permanent magnet linear synchronous motor is developed and implemented in the thesis. The purpose of the controller is to track a flexible load to the desired velocity reference as fast as possible and without awkward oscillations. The control method is based on an adaptive backstepping algorithm with its stability ensured by the Lyapunov stability theorem. As a reference controller for the backstepping method, a hybrid neural controller is introduced in which the linear motor itself is controlled by a conventional PI velocity controller and the vibration of the associated flexible mechanism is suppressed from an outer control loop using a compensation signal from a multilayer perceptron network. To avoid the local minimum problem entailed in neural networks, the initial weights are searched for offline by means of a differential evolution algorithm. The states of a mechanical system for controllers are estimated using the Kalman filter. The theoretical results obtained from the control design are validated with the lumped mass model for a mechanism. Generalization of the mechanism allows the methods derived here to be widely implemented in machine automation. The control algorithms are first designed in a specially introduced nonlinear simulation model and then implemented in the physical linear motor using a DSP (Digital Signal Processor) application. The measurements prove that both controllers are capable of suppressing vibration, but that the backstepping method is superior to others due to its accuracy of response and stability properties.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The objective of the work is to study fluid flow behavior through a pinch valve and to estimate the flow coefficient (KV ) at different opening positions of the valve. The flow inside a compressed valve is more complex than in a straight pipe, and it is one of main topics of interest for engineers in process industry. In the present work, we have numerically simulated compressed valve flow at different opening positions. In order to simulate the flow through pinch valve, several models of the elastomeric valve tube (pinch valve tube) at different opening positions were constructed in 2D-axisymmetric and 3D geometries. The numerical simulations were performed with the CFD packages; ANSYS FLUENT and ANSYS CFX by using parallel computing. The distributions of static pressure, velocity and turbulent kinetic energy have been studied at different opening positions of the valve in both 2D-axisymmetric and 3D experiments. The flow coefficient (KV ) values have been measured at different valve openings and are compared between 2D-axisymmetric and 3D simulation results.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Tässä työssä verrattiin monikerrosperseptronin, radiaalikantafunktioverkon, tukivektoriregression ja relevanssivektoriregression soveltuvuutta robottikäden otemallinnukseen. Menetelmille ohjelmoitiin koeympäristö Matlabiin, jossa mallit koestettiin kolmiulotteisella kappaledatalla. Koejärjestely sisälsi kaksi vaihetta. Kokeiden ensimmäisessä vaiheessa menetelmille haettiin sopivat parametrit ja toisessa vaiheessa menetelmät koestettiin. Kokeilla kerättiin dataa menetelmien keskinäiseen vertailuun. Vertailussa huomioitiin laskentanopeus, koulutusaika ja tarkkuus. Tukivektoriregressio löydettiin potentiaaliseksi vaihtoehdoksi mallintamiseen. Tukivektoriregression koetuloksia analysoitiin muita menetelmiä enemmän hyvien koetulosten takia.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

One challenge on data assimilation (DA) methods is how the error covariance for the model state is computed. Ensemble methods have been proposed for producing error covariance estimates, as error is propagated in time using the non-linear model. Variational methods, on the other hand, use the concepts of control theory, whereby the state estimate is optimized from both the background and the measurements. Numerical optimization schemes are applied which solve the problem of memory storage and huge matrix inversion needed by classical Kalman filter methods. Variational Ensemble Kalman filter (VEnKF), as a method inspired the Variational Kalman Filter (VKF), enjoys the benefits from both ensemble methods and variational methods. It avoids filter inbreeding problems which emerge when the ensemble spread underestimates the true error covariance. In VEnKF this is tackled by resampling the ensemble every time measurements are available. One advantage of VEnKF over VKF is that it needs neither tangent linear code nor adjoint code. In this thesis, VEnKF has been applied to a two-dimensional shallow water model simulating a dam-break experiment. The model is a public code with water height measurements recorded in seven stations along the 21:2 m long 1:4 m wide flume’s mid-line. Because the data were too sparse to assimilate the 30 171 model state vector, we chose to interpolate the data both in time and in space. The results of the assimilation were compared with that of a pure simulation. We have found that the results revealed by the VEnKF were more realistic, without numerical artifacts present in the pure simulation. Creating a wrapper code for a model and DA scheme might be challenging, especially when the two were designed independently or are poorly documented. In this thesis we have presented a non-intrusive approach of coupling the model and a DA scheme. An external program is used to send and receive information between the model and DA procedure using files. The advantage of this method is that the model code changes needed are minimal, only a few lines which facilitate input and output. Apart from being simple to coupling, the approach can be employed even if the two were written in different programming languages, because the communication is not through code. The non-intrusive approach is made to accommodate parallel computing by just telling the control program to wait until all the processes have ended before the DA procedure is invoked. It is worth mentioning the overhead increase caused by the approach, as at every assimilation cycle both the model and the DA procedure have to be initialized. Nonetheless, the method can be an ideal approach for a benchmark platform in testing DA methods. The non-intrusive VEnKF has been applied to a multi-purpose hydrodynamic model COHERENS to assimilate Total Suspended Matter (TSM) in lake Säkylän Pyhäjärvi. The lake has an area of 154 km2 with an average depth of 5:4 m. Turbidity and chlorophyll-a concentrations from MERIS satellite images for 7 days between May 16 and July 6 2009 were available. The effect of the organic matter has been computationally eliminated to obtain TSM data. Because of computational demands from both COHERENS and VEnKF, we have chosen to use 1 km grid resolution. The results of the VEnKF have been compared with the measurements recorded at an automatic station located at the North-Western part of the lake. However, due to TSM data sparsity in both time and space, it could not be well matched. The use of multiple automatic stations with real time data is important to elude the time sparsity problem. With DA, this will help in better understanding the environmental hazard variables for instance. We have found that using a very high ensemble size does not necessarily improve the results, because there is a limit whereby additional ensemble members add very little to the performance. Successful implementation of the non-intrusive VEnKF and the ensemble size limit for performance leads to an emerging area of Reduced Order Modeling (ROM). To save computational resources, running full-blown model in ROM is avoided. When the ROM is applied with the non-intrusive DA approach, it might result in a cheaper algorithm that will relax computation challenges existing in the field of modelling and DA.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Memristive computing refers to the utilization of the memristor, the fourth fundamental passive circuit element, in computational tasks. The existence of the memristor was theoretically predicted in 1971 by Leon O. Chua, but experimentally validated only in 2008 by HP Labs. A memristor is essentially a nonvolatile nanoscale programmable resistor — indeed, memory resistor — whose resistance, or memristance to be precise, is changed by applying a voltage across, or current through, the device. Memristive computing is a new area of research, and many of its fundamental questions still remain open. For example, it is yet unclear which applications would benefit the most from the inherent nonlinear dynamics of memristors. In any case, these dynamics should be exploited to allow memristors to perform computation in a natural way instead of attempting to emulate existing technologies such as CMOS logic. Examples of such methods of computation presented in this thesis are memristive stateful logic operations, memristive multiplication based on the translinear principle, and the exploitation of nonlinear dynamics to construct chaotic memristive circuits. This thesis considers memristive computing at various levels of abstraction. The first part of the thesis analyses the physical properties and the current-voltage behaviour of a single device. The middle part presents memristor programming methods, and describes microcircuits for logic and analog operations. The final chapters discuss memristive computing in largescale applications. In particular, cellular neural networks, and associative memory architectures are proposed as applications that significantly benefit from memristive implementation. The work presents several new results on memristor modeling and programming, memristive logic, analog arithmetic operations on memristors, and applications of memristors. The main conclusion of this thesis is that memristive computing will be advantageous in large-scale, highly parallel mixed-mode processing architectures. This can be justified by the following two arguments. First, since processing can be performed directly within memristive memory architectures, the required circuitry, processing time, and possibly also power consumption can be reduced compared to a conventional CMOS implementation. Second, intrachip communication can be naturally implemented by a memristive crossbar structure.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Video transcoding refers to the process of converting a digital video from one format into another format. It is a compute-intensive operation. Therefore, transcoding of a large number of simultaneous video streams requires a large amount of computing resources. Moreover, to handle di erent load conditions in a cost-e cient manner, the video transcoding service should be dynamically scalable. Infrastructure as a Service Clouds currently offer computing resources, such as virtual machines, under the pay-per-use business model. Thus the IaaS Clouds can be leveraged to provide a coste cient, dynamically scalable video transcoding service. To use computing resources e ciently in a cloud computing environment, cost-e cient virtual machine provisioning is required to avoid overutilization and under-utilization of virtual machines. This thesis presents proactive virtual machine resource allocation and de-allocation algorithms for video transcoding in cloud computing. Since users' requests for videos may change at di erent times, a check is required to see if the current computing resources are adequate for the video requests. Therefore, the work on admission control is also provided. In addition to admission control, temporal resolution reduction is used to avoid jitters in a video. Furthermore, in a cloud computing environment such as Amazon EC2, the computing resources are more expensive as compared with the storage resources. Therefore, to avoid repetition of transcoding operations, a transcoded video needs to be stored for a certain time. To store all videos for the same amount of time is also not cost-e cient because popular transcoded videos have high access rate while unpopular transcoded videos are rarely accessed. This thesis provides a cost-e cient computation and storage trade-o strategy, which stores videos in the video repository as long as it is cost-e cient to store them. This thesis also proposes video segmentation strategies for bit rate reduction and spatial resolution reduction video transcoding. The evaluation of proposed strategies is performed using a message passing interface based video transcoder, which uses a coarse-grain parallel processing approach where video is segmented at group of pictures level.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The whole research of the current Master Thesis project is related to Big Data transfer over Parallel Data Link and my main objective is to assist the Saint-Petersburg National Research University ITMO research team to accomplish this project and apply Green IT methods for the data transfer system. The goal of the team is to transfer Big Data by using parallel data links with SDN Openflow approach. My task as a team member was to compare existing data transfer applications in case to verify which results the highest data transfer speed in which occasions and explain the reasons. In the context of this thesis work a comparison between 5 different utilities was done, which including Fast Data Transfer (FDT), BBCP, BBFTP, GridFTP, and FTS3. A number of scripts where developed which consist of creating random binary data to be incompressible to have fair comparison between utilities, execute the Utilities with specified parameters, create log files, results, system parameters, and plot graphs to compare the results. Transferring such an enormous variety of data can take a long time, and hence, the necessity appears to reduce the energy consumption to make them greener. In the context of Green IT approach, our team used Cloud Computing infrastructure called OpenStack. It’s more efficient to allocated specific amount of hardware resources to test different scenarios rather than using the whole resources from our testbed. Testing our implementation with OpenStack infrastructure results that the virtual channel does not consist of any traffic and we can achieve the highest possible throughput. After receiving the final results we are in place to identify which utilities produce faster data transfer in different scenarios with specific TCP parameters and we can use them in real network data links.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This thesis gives an overview of the use of the level set methods in the field of image science. The similar fast marching method is discussed for comparison, also the narrow band and the particle level set methods are introduced. The level set method is a numerical scheme for representing, deforming and recovering structures in an arbitrary dimensions. It approximates and tracks the moving interfaces, dynamic curves and surfaces. The level set method does not define how and why some boundary is advancing the way it is but simply represents and tracks the boundary. The principal idea of the level set method is to represent the N dimensional boundary in the N+l dimensions. This gives the generality to represent even the complex boundaries. The level set methods can be powerful tools to represent dynamic boundaries, but they can require lot of computing power. Specially the basic level set method have considerable computational burden. This burden can be alleviated with more sophisticated versions of the level set algorithm like the narrow band level set method or with the programmable hardware implementation. Also the parallel approach can be used in suitable applications. It is concluded that these methods can be used in a quite broad range of image applications, like computer vision and graphics, scientific visualization and also to solve problems in computational physics. Level set methods and methods derived and inspired by it will be in the front line of image processing also in the future.