33 resultados para General-purpose computing on graphics processing units (GPGPU)
em Doria (National Library of Finland DSpace Services) - National Library of Finland, Finland
Resumo:
In this work, the feasibility of the floating-gate technology in analog computing platforms in a scaled down general-purpose CMOS technology is considered. When the technology is scaled down the performance of analog circuits tends to get worse because the process parameters are optimized for digital transistors and the scaling involves the reduction of supply voltages. Generally, the challenge in analog circuit design is that all salient design metrics such as power, area, bandwidth and accuracy are interrelated. Furthermore, poor flexibility, i.e. lack of reconfigurability, the reuse of IP etc., can be considered the most severe weakness of analog hardware. On this account, digital calibration schemes are often required for improved performance or yield enhancement, whereas high flexibility/reconfigurability can not be easily achieved. Here, it is discussed whether it is possible to work around these obstacles by using floating-gate transistors (FGTs), and analyze problems associated with the practical implementation. FGT technology is attractive because it is electrically programmable and also features a charge-based built-in non-volatile memory. Apart from being ideal for canceling the circuit non-idealities due to process variations, the FGTs can also be used as computational or adaptive elements in analog circuits. The nominal gate oxide thickness in the deep sub-micron (DSM) processes is too thin to support robust charge retention and consequently the FGT becomes leaky. In principle, non-leaky FGTs can be implemented in a scaled down process without any special masks by using “double”-oxide transistors intended for providing devices that operate with higher supply voltages than general purpose devices. However, in practice the technology scaling poses several challenges which are addressed in this thesis. To provide a sufficiently wide-ranging survey, six prototype chips with varying complexity were implemented in four different DSM process nodes and investigated from this perspective. The focus is on non-leaky FGTs, but the presented autozeroing floating-gate amplifier (AFGA) demonstrates that leaky FGTs may also find a use. The simplest test structures contain only a few transistors, whereas the most complex experimental chip is an implementation of a spiking neural network (SNN) which comprises thousands of active and passive devices. More precisely, it is a fully connected (256 FGT synapses) two-layer spiking neural network (SNN), where the adaptive properties of FGT are taken advantage of. A compact realization of Spike Timing Dependent Plasticity (STDP) within the SNN is one of the key contributions of this thesis. Finally, the considerations in this thesis extend beyond CMOS to emerging nanodevices. To this end, one promising emerging nanoscale circuit element - memristor - is reviewed and its applicability for analog processing is considered. Furthermore, it is discussed how the FGT technology can be used to prototype computation paradigms compatible with these emerging two-terminal nanoscale devices in a mature and widely available CMOS technology.
Resumo:
Many-core systems are emerging from the need of more computational power and power efficiency. However there are many issues which still revolve around the many-core systems. These systems need specialized software before they can be fully utilized and the hardware itself may differ from the conventional computational systems. To gain efficiency from many-core system, programs need to be parallelized. In many-core systems the cores are small and less powerful than cores used in traditional computing, so running a conventional program is not an efficient option. Also in Network-on-Chip based processors the network might get congested and the cores might work at different speeds. In this thesis is, a dynamic load balancing method is proposed and tested on Intel 48-core Single-Chip Cloud Computer by parallelizing a fault simulator. The maximum speedup is difficult to obtain due to severe bottlenecks in the system. In order to exploit all the available parallelism of the Single-Chip Cloud Computer, a runtime approach capable of dynamically balancing the load during the fault simulation process is used. The proposed dynamic fault simulation approach on the Single-Chip Cloud Computer shows up to 45X speedup compared to a serial fault simulation approach. Many-core systems can draw enormous amounts of power, and if this power is not controlled properly, the system might get damaged. One way to manage power is to set power budget for the system. But if this power is drawn by just few cores of the many, these few cores get extremely hot and might get damaged. Due to increase in power density multiple thermal sensors are deployed on the chip area to provide realtime temperature feedback for thermal management techniques. Thermal sensor accuracy is extremely prone to intra-die process variation and aging phenomena. These factors lead to a situation where thermal sensor values drift from the nominal values. This necessitates efficient calibration techniques to be applied before the sensor values are used. In addition, in modern many-core systems cores have support for dynamic voltage and frequency scaling. Thermal sensors located on cores are sensitive to the core's current voltage level, meaning that dedicated calibration is needed for each voltage level. In this thesis a general-purpose software-based auto-calibration approach is also proposed for thermal sensors to calibrate thermal sensors on different range of voltages.
Resumo:
Tämän kannattavuustutkimuksen lähtökohtana oli se, että Yhtyneet Sahat Oy:n Kaukaan sahalla ja Luumäen jatkojalostuslaitoksella haluttiin selvittää pellettitehtaan kannattavuus nykyisessä markkinatilanteessa. Tämä työon luonteeltaan teknis-taloudellinen selvitys eli ns. feasibility study. Pelletöintiprosessi on tekniikaltaan yksinkertainen eikä edellytä korkea teknologian laitteita. Toimiala on maailmanlaajuisesti varsin uusi. Suomessa pellettimarkkinat ovat vielä pienet ja kehittymättömät, mutta kasvua on viime vuosina tapahtunut. Valtaosa kotimaan tuotannosta menee vientiin. Investoinnin laskentaprosessissa saadut tuotannon alkuarvot sekä kustannusrakenteen määrittelyt ovat perustana varsinaisille kannattavuuslaskelmille. Laskelmista on selvitetty investointeihin liittyvät yleisimmät taloudelliset tunnusluvut ja herkimpiä muuttujia on tutkittu ja pohdittu herkkyysanalyysiä apuna käyttäen.
Resumo:
Työssä tarkastellaan kolmen eri valmistajan signaaliprosessoriperheitä. Työn tavoitteena on tutkia prosessoreiden teknistä soveltuvuutta suunnitteilla olevaan taajuusmuuttajatuoteperheeseen. Työn alkuosassa käydään taajuusmuuttajan rakenne läpi ja selostetaan oikosulkumoottorin yleisimmät ohjausmenetelmät. Työssä selvitetään myös signaaliprosessorin ja integroitujen oheispiirien toimintaa. Työn painopiste prosessoreiden teknisten ominaisuuksien vertailussa. Työssä on vertailtu muun muassa prosessoreiden sisäistä rakennetta, käskykantojen ominaisuuksia, keskeytysten palveluun kuluvaa aikaa ja oheispiirien ominaisuuksia. Oheispiirien, erityisesti analogiadigitaalimuuntimen halutunlainen toiminta on moottorinohjausohjelmiston kannalta tärkeää. Työhön sisällytetyt prosessoriperheet on pisteytetty tarkasteltujen ominaisuuksien osalta. Vertailun tuloksena on esitetty haettuun tarkoitukseen teknisesti soveltuvin prosessoriperhe ja prosessorityyppi. Työssä ei kuitenkaan voida antaa yleistä paremmuusjärjestystä tutkituille prosessoreille.
Resumo:
As the development of integrated circuit technology continues to follow Moore’s law the complexity of circuits increases exponentially. Traditional hardware description languages such as VHDL and Verilog are no longer powerful enough to cope with this level of complexity and do not provide facilities for hardware/software codesign. Languages such as SystemC are intended to solve these problems by combining the powerful expression of high level programming languages and hardware oriented facilities of hardware description languages. To fully replace older languages in the desing flow of digital systems SystemC should also be synthesizable. The devices required by modern high speed networks often share the same tight constraints for e.g. size, power consumption and price with embedded systems but have also very demanding real time and quality of service requirements that are difficult to satisfy with general purpose processors. Dedicated hardware blocks of an application specific instruction set processor are one way to combine fast processing speed, energy efficiency, flexibility and relatively low time-to-market. Common features can be identified in the network processing domain making it possible to develop specialized but configurable processor architectures. One such architecture is the TACO which is based on transport triggered architecture. The architecture offers a high degree of parallelism and modularity and greatly simplified instruction decoding. For this M.Sc.(Tech) thesis, a simulation environment for the TACO architecture was developed with SystemC 2.2 using an old version written with SystemC 1.0 as a starting point. The environment enables rapid design space exploration by providing facilities for hw/sw codesign and simulation and an extendable library of automatically configured reusable hardware blocks. Other topics that are covered are the differences between SystemC 1.0 and 2.2 from the viewpoint of hardware modeling, and compilation of a SystemC model into synthesizable VHDL with Celoxica Agility SystemC Compiler. A simulation model for a processor for TCP/IP packet validation was designed and tested as a test case for the environment.
Resumo:
Cloud computing enables on-demand network access to shared resources (e.g., computation, networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort. Cloud computing refers to both the applications delivered as services over the Internet and the hardware and system software in the data centers. Software as a service (SaaS) is part of cloud computing. It is one of the cloud service models. SaaS is software deployed as a hosted service and accessed over the Internet. In SaaS, the consumer uses the provider‘s applications running in the cloud. SaaS separates the possession and ownership of software from its use. The applications can be accessed from any device through a thin client interface. A typical SaaS application is used with a web browser based on monthly pricing. In this thesis, the characteristics of cloud computing and SaaS are presented. Also, a few implementation platforms for SaaS are discussed. Then, four different SaaS implementation cases and one transformation case are deliberated. The pros and cons of SaaS are studied. This is done based on literature references and analysis of the SaaS implementations and the transformation case. The analysis is done both from the customer‘s and service provider‘s point of view. In addition, the pros and cons of on-premises software are listed. The purpose of this thesis is to find when SaaS should be utilized and when it is better to choose a traditional on-premises software. The qualities of SaaS bring many benefits both for the customer as well as the provider. A customer should utilize SaaS when it provides cost savings, ease, and scalability over on-premises software. SaaS is reasonable when the customer does not need tailoring, but he only needs a simple, general-purpose service, and the application supports customer‘s core business. A provider should utilize SaaS when it offers cost savings, scalability, faster development, and wider customer base over on-premises software. It is wise to choose SaaS when the application is cheap, aimed at mass market, needs frequent updating, needs high performance computing, needs storing large amounts of data, or there is some other direct value from the cloud infrastructure.
Resumo:
Multiprocessing is a promising solution to meet the requirements of near future applications. To get full benefit from parallel processing, a manycore system needs efficient, on-chip communication architecture. Networkon- Chip (NoC) is a general purpose communication concept that offers highthroughput, reduced power consumption, and keeps complexity in check by a regular composition of basic building blocks. This thesis presents power efficient communication approaches for networked many-core systems. We address a range of issues being important for designing power-efficient manycore systems at two different levels: the network-level and the router-level. From the network-level point of view, exploiting state-of-the-art concepts such as Globally Asynchronous Locally Synchronous (GALS), Voltage/ Frequency Island (VFI), and 3D Networks-on-Chip approaches may be a solution to the excessive power consumption demanded by today’s and future many-core systems. To this end, a low-cost 3D NoC architecture, based on high-speed GALS-based vertical channels, is proposed to mitigate high peak temperatures, power densities, and area footprints of vertical interconnects in 3D ICs. To further exploit the beneficial feature of a negligible inter-layer distance of 3D ICs, we propose a novel hybridization scheme for inter-layer communication. In addition, an efficient adaptive routing algorithm is presented which enables congestion-aware and reliable communication for the hybridized NoC architecture. An integrated monitoring and management platform on top of this architecture is also developed in order to implement more scalable power optimization techniques. From the router-level perspective, four design styles for implementing power-efficient reconfigurable interfaces in VFI-based NoC systems are proposed. To enhance the utilization of virtual channel buffers and to manage their power consumption, a partial virtual channel sharing method for NoC routers is devised and implemented. Extensive experiments with synthetic and real benchmarks show significant power savings and mitigated hotspots with similar performance compared to latest NoC architectures. The thesis concludes that careful codesigned elements from different network levels enable considerable power savings for many-core systems.
Resumo:
The purpose of this research was to provide a deeper insight into the consequences of electronic human resource management (e-HRM) for line managers. The consequences are viewed as used information system (IS) potentials pertaining to the moderate voluntaristic category of consequences. Due to the need to contextualize the research and draw on line managers’ personal experiences, a qualitative approach in a case study setting was selected. The empirical part of the research is loosely based on literature on HRM and e-HRM and it was conducted in an industrial private sector company. In this thesis, method triangulation was utilized, as nine semi-structured interviews, conducted in a European setting, created the main method for data collection and analysis. Other complementary data such as HRM documentation and statistics of e-HRM system usage were utilized as background information to help to put the results into context. E-HRM has partly been taken into use in the case study company. Line managers tend to use e-HRM when a particular task requires it, but they are not familiar with all the features and possibilities which e-HRM has to offer. The advantages of e-HRM are in line with the company’s goals. The advantages are e.g. an transparency of data, process consistency, and having an efficient and easy-to-use tool at one’s disposal. However, several unintended, even contradictory, and mainly negative outcomes can also be identified, such as over-complicated processes, in-security in use of the tool, and the lack of co-operation with HR professionals. The use of e-HRM and managers’ perceptions regarding e-HRM affect the way in which managers perceive the consequences of e-HRM on their work. Overall, the consequences of e-HRM are divergent, even contradictory. The managers who considered e-HRM mostly beneficial to their work found that e-HRM affects their work by providing information and increasing efficiency. Those managers who mostly perceived challenges in e-HRM did not think that e-HRM had affected their role or their work. Even though the perceptions regarding e-HRM and its consequences might reflect the strategies, the distribution of work, and the ways of working in all HRM in general and can’t be generalized as such, this research contributed to the field of e-HRM and it provides new perspectives to e-HRM in the case study organization and in the academic field in general.
Resumo:
With the shift towards many-core computer architectures, dataflow programming has been proposed as one potential solution for producing software that scales to a varying number of processor cores. Programming for parallel architectures is considered difficult as the current popular programming languages are inherently sequential and introducing parallelism is typically up to the programmer. Dataflow, however, is inherently parallel, describing an application as a directed graph, where nodes represent calculations and edges represent a data dependency in form of a queue. These queues are the only allowed communication between the nodes, making the dependencies between the nodes explicit and thereby also the parallelism. Once a node have the su cient inputs available, the node can, independently of any other node, perform calculations, consume inputs, and produce outputs. Data ow models have existed for several decades and have become popular for describing signal processing applications as the graph representation is a very natural representation within this eld. Digital lters are typically described with boxes and arrows also in textbooks. Data ow is also becoming more interesting in other domains, and in principle, any application working on an information stream ts the dataflow paradigm. Such applications are, among others, network protocols, cryptography, and multimedia applications. As an example, the MPEG group standardized a dataflow language called RVC-CAL to be use within reconfigurable video coding. Describing a video coder as a data ow network instead of with conventional programming languages, makes the coder more readable as it describes how the video dataflows through the different coding tools. While dataflow provides an intuitive representation for many applications, it also introduces some new problems that need to be solved in order for data ow to be more widely used. The explicit parallelism of a dataflow program is descriptive and enables an improved utilization of available processing units, however, the independent nodes also implies that some kind of scheduling is required. The need for efficient scheduling becomes even more evident when the number of nodes is larger than the number of processing units and several nodes are running concurrently on one processor core. There exist several data ow models of computation, with different trade-offs between expressiveness and analyzability. These vary from rather restricted but statically schedulable, with minimal scheduling overhead, to dynamic where each ring requires a ring rule to evaluated. The model used in this work, namely RVC-CAL, is a very expressive language, and in the general case it requires dynamic scheduling, however, the strong encapsulation of dataflow nodes enables analysis and the scheduling overhead can be reduced by using quasi-static, or piecewise static, scheduling techniques. The scheduling problem is concerned with nding the few scheduling decisions that must be run-time, while most decisions are pre-calculated. The result is then an, as small as possible, set of static schedules that are dynamically scheduled. To identify these dynamic decisions and to find the concrete schedules, this thesis shows how quasi-static scheduling can be represented as a model checking problem. This involves identifying the relevant information to generate a minimal but complete model to be used for model checking. The model must describe everything that may affect scheduling of the application while omitting everything else in order to avoid state space explosion. This kind of simplification is necessary to make the state space analysis feasible. For the model checker to nd the actual schedules, a set of scheduling strategies are de ned which are able to produce quasi-static schedulers for a wide range of applications. The results of this work show that actor composition with quasi-static scheduling can be used to transform data ow programs to t many different computer architecture with different type and number of cores. This in turn, enables dataflow to provide a more platform independent representation as one application can be fitted to a specific processor architecture without changing the actual program representation. Instead, the program representation is in the context of design space exploration optimized by the development tools to fit the target platform. This work focuses on representing the dataflow scheduling problem as a model checking problem and is implemented as part of a compiler infrastructure. The thesis also presents experimental results as evidence of the usefulness of the approach.
Resumo:
Internet of Things (IoT) technologies are developing rapidly, and therefore there exist several standards of interconnection protocols and platforms. The existence of heterogeneous protocols and platforms has become a critical challenge for IoT system developers. To mitigate this challenge, few alliances and organizations have taken the initiative to build a framework that helps to integrate application silos. Some of these frameworks focus only on a specific domain like home automation. However, the resource constraints in the large proportion of connected devices make it difficult to build an interoperable system using such frameworks. Therefore, a general purpose, lightweight interoperability framework that can be used for a range of devices is required. To tackle the heterogeneous nature, this work introduces an embedded, distributed and lightweight service bus, Lightweight IoT Service bus Architecture (LISA), which fits inside the network stack of a small real-time operating system for constrained nodes. LISA provides a uniform application programming interface for an IoT system on a range of devices with variable resource constraints. It hides platform and protocol variations underneath it, thus facilitating interoperability in IoT implementations. LISA is inspired by the Network on Terminal Architecture, a service centric open architecture by Nokia Research Center. Unlike many other interoperability frameworks, LISA is designed specifically for resource constrained nodes and it provides essential features of a service bus for easy service oriented architecture implementation. The presented architecture utilizes an intermediate computing layer, a Fog layer, between the small nodes and the cloud, thereby facilitating the federation of constrained nodes into subnetworks. As a result of a modular and distributed design, the part of LISA running in the Fog layer handles the heavy lifting to assist the lightweight portion of LISA inside the resource constrained nodes. Furthermore, LISA introduces a new networking paradigm, Node Centric Networking, to route messages across protocol boundaries to facilitate interoperability. This thesis presents a concept implementation of the architecture and creates a foundation for future extension towards a comprehensive interoperability framework for IoT.
Resumo:
Tietokoneiden tallennuskapasiteetin ja sekä tietokoneiden että verkkojen nopeuden kasvaessa myös käyttäjien odotukset kasvavat. Tietoa talletetaan yhä enemmän ja näistä tiedoista laaditaan yhä monimutkaisempia raportteja. Raporttien monimutkaisuuden kasvaessa niiden tarvitseman tiedon keräämiseen kuluva aika ei kuitenkaan saisi oleellisesti kasvaa. Tämän työn tarkoituksena on tutkia ja parantaa kansainvälisen metsäteollisuusyrityksen myynnin ja logistiikan järjestelmän raportointitietokannan tehokkuutta etenkin raporttien tietojen keräämiseen kuluvalla ajalla mitattuna. Työssä keskitytään kartoittamaan nykyisen järjestelmän pullonkauloja ja pyritään parantamaan järjestelmän suorituskykyä. Tulevaisuudessa suorituskykyä tarvitaan kuitenkin lisää, joten työssä tarkastellaan myös nykyisen, yleiskäyttöisen tietokannan, korvaamista erityisesti raportointia varten suunnitellulla tietokannalla. Työn tuloksena järjestelmän raporttien tietojen keräämiseen kuluvaa aikaa pystyttiin pienentämään ja pahimmat pullonkaulat selvittämään. Käyttäjämäärän kasvaessa tietokannan suorituskyvyn rajat tulevat kuitenkin pian vastaan. Tietokanta joudutaan tulevaisuudessa vaihtamaan erityisesti raportointitietokannaksi suunniteltuun.
Resumo:
Diplomityön tavoitteena oli tehostaa venttiilipesien koneistuksessa käytettävän monitoimipystysorvin NC - ohjelmointia CAM - ohjelman käyttöönotolla. Tutkimus on osa laajempaa kokonaisuutta liittyen koneistusalihankinnan kehittämiseen ja yrityksen kilpailukyvyn ylläpitoon ja parantamiseen liiketoiminta-alueella, jolla on tällä hetkellä hyvät kasvunäkymät. Tavoite rajattiin yritykseen jo aiemmin hankitun WinCAM - ohjelman päivittämiseen ja hyödyntämiseen monitoimipystysorvin NC - ohjelmoinnissa. Tutkimuksen käytännön tavoitteena oli selvittää CAM - ohjelmoinnin käyttömahdollisuudet, sekä luoda CAM - ohjelmistoon pohjautuva, räätälöity NC - ohjelmointikonsepti pilottikohteeseen. Tutkimuksen kokeellisen osuuden muodostivat tällöin nykyisen tuotannon ongelmakohtien löytäminen, koneen ohjelmointitarpeiden kartoitus,sekä menetelmäkehitys. Tutkimuksen päämääränä oli tuotannon tasolla käytettävä järjestelmä, jolla koneen ohjelmointi olisi mahdollista myös vähemmällä konekohtaisella kokemuksella. Nykyisen toimintatavan ongelmina olivat yhtenäisen NC - ohjelmointikäytännön puute, niin valmiiden ohjelmien käytössä kuin uusienkin ohjelmien tekemisessä. Tähän olivat syynä NC - ohjauksen heikko käytettävyys erityisesti sorvauksen osalta. Nämä tekijät yhdistettynä monitoimityöstökoneessa tarvittavaan koordinaatiston hallintaan vaikeuttivat ohjelmointia. Työntekijäkohtaiset erot NC - ohjelmien käytössä, sekä laadultaan vaihtelevat valuaihiot aiheuttivat tuotannon läpäisyaikaan merkittävää vaihtelua. Siten myös koneen kuormituksen säätely oli vaikeaa. Uuden ohjelmointikonseptin toteutuksessa pidettiin etusijalla hyvää käytettävyyttä, sekä uuden menetelmän aukotonta liittymistä olemassa oleviin tuotantojärjestelmiin. Ohjelmointikonseptin toteutuksessa, osaperheestä haettiin selkeästi parametroitaviksi soveltuvat työvaiheet, jotka voitiin hallita yleiskäyttöisillä aliohjelmilla. Tuotteiden muidengeometrioiden hallintaan laadittiin geometriakirjasto, jota voitiin käyttää tavanomaisen graafisen ohjelmoinnin pohjana. Vanhaa toimintatapaa ja diplomityön aikana kehitettyä CAM - ohjelmointijärjestelmää vertailtiin perustuen NC - ohjelmien tehokkuuteen, jota tarkasteltiin saman työvaiheen työstöaikaan perustuen. Tämän lisäksi tärkeän tuloksen muodostavat myös kvalitatiivisetseikat, jotka liittyvät ohjelmointiympäristön käytettävyyteen. CAM - ohjelmoinnin kehittäminen ja käyttöönotto pilottikohteessa sujui pääosin hyvin ja laaditunsuunnitelman mukaisesti. Aiemmin hankalasti ohjelmoitavat työvaiheet, kuten erilaisten laippatasopintojen ja reikäpiirien ohjelmointi muutettiin makrokäyttöön soveltuviksi. Sorvauksessa ongelmia aiheuttaneen tiivistelilan koneistukseen sovellettiin graafista ohjelmointia. Koko tuotannon mittakaavassa NC - ohjelmoinninosuus oli kuitenkin vähäinen, mistä johtuen koneen tuottavuuteen ei tutkimuksenajanjaksolla voitu vaikuttaa. Sen sijaan tuotannon sujuvuuteen oleellisesti vaikuttavaa työtekijöiden 'hiljaisen tiedon' määrää voitiin vähentää vakioimalla ohjelmointia ja siirtämällä tehokkaiksi havaitut menetelmät ohjelmointijärjestelmään.
Resumo:
The topic of this thesis is studying how lesions in retina caused by diabetic retinopathy can be detected from color fundus images by using machine vision methods. Methods for equalizing uneven illumination in fundus images, detecting regions of poor image quality due toinadequate illumination, and recognizing abnormal lesions were developed duringthe work. The developed methods exploit mainly the color information and simpleshape features to detect lesions. In addition, a graphical tool for collecting lesion data was developed. The tool was used by an ophthalmologist who marked lesions in the images to help method development and evaluation. The tool is a general purpose one, and thus it is possible to reuse the tool in similar projects.The developed methods were tested with a separate test set of 128 color fundus images. From test results it was calculated how accurately methods classify abnormal funduses as abnormal (sensitivity) and healthy funduses as normal (specificity). The sensitivity values were 92% for hemorrhages, 73% for red small dots (microaneurysms and small hemorrhages), and 77% for exudates (hard and soft exudates). The specificity values were 75% for hemorrhages, 70% for red small dots, and 50% for exudates. Thus, the developed methods detected hemorrhages accurately and microaneurysms and exudates moderately.