35 resultados para Multiprocessor computer architectures
Resumo:
As technology geometries have shrunk to the deep submicron regime, the communication delay and power consumption of global interconnections in high performance Multi- Processor Systems-on-Chip (MPSoCs) are becoming a major bottleneck. The Network-on- Chip (NoC) architecture paradigm, based on a modular packet-switched mechanism, can address many of the on-chip communication issues such as performance limitations of long interconnects and integration of large number of Processing Elements (PEs) on a chip. The choice of routing protocol and NoC structure can have a significant impact on performance and power consumption in on-chip networks. In addition, building a high performance, area and energy efficient on-chip network for multicore architectures requires a novel on-chip router allowing a larger network to be integrated on a single die with reduced power consumption. On top of that, network interfaces are employed to decouple computation resources from communication resources, to provide the synchronization between them, and to achieve backward compatibility with existing IP cores. Three adaptive routing algorithms are presented as a part of this thesis. The first presented routing protocol is a congestion-aware adaptive routing algorithm for 2D mesh NoCs which does not support multicast (one-to-many) traffic while the other two protocols are adaptive routing models supporting both unicast (one-to-one) and multicast traffic. A streamlined on-chip router architecture is also presented for avoiding congested areas in 2D mesh NoCs via employing efficient input and output selection. The output selection utilizes an adaptive routing algorithm based on the congestion condition of neighboring routers while the input selection allows packets to be serviced from each input port according to its congestion level. Moreover, in order to increase memory parallelism and bring compatibility with existing IP cores in network-based multiprocessor architectures, adaptive network interface architectures are presented to use multiple SDRAMs which can be accessed simultaneously. In addition, a smart memory controller is integrated in the adaptive network interface to improve the memory utilization and reduce both memory and network latencies. Three Dimensional Integrated Circuits (3D ICs) have been emerging as a viable candidate to achieve better performance and package density as compared to traditional 2D ICs. In addition, combining the benefits of 3D IC and NoC schemes provides a significant performance gain for 3D architectures. In recent years, inter-layer communication across multiple stacked layers (vertical channel) has attracted a lot of interest. In this thesis, a novel adaptive pipeline bus structure is proposed for inter-layer communication to improve the performance by reducing the delay and complexity of traditional bus arbitration. In addition, two mesh-based topologies for 3D architectures are also introduced to mitigate the inter-layer footprint and power dissipation on each layer with a small performance penalty.
Resumo:
Rapid ongoing evolution of multiprocessors will lead to systems with hundreds of processing cores integrated in a single chip. An emerging challenge is the implementation of reliable and efficient interconnection between these cores as well as other components in the systems. Network-on-Chip is an interconnection approach which is intended to solve the performance bottleneck caused by traditional, poorly scalable communication structures such as buses. However, a large on-chip network involves issues related to congestion problems and system control, for instance. Additionally, faults can cause problems in multiprocessor systems. These faults can be transient faults, permanent manufacturing faults, or they can appear due to aging. To solve the emerging traffic management, controllability issues and to maintain system operation regardless of faults a monitoring system is needed. The monitoring system should be dynamically applicable to various purposes and it should fully cover the system under observation. In a large multiprocessor the distances between components can be relatively long. Therefore, the system should be designed so that the amount of energy-inefficient long-distance communication is minimized. This thesis presents a dynamically clustered distributed monitoring structure. The monitoring is distributed so that no centralized control is required for basic tasks such as traffic management and task mapping. To enable extensive analysis of different Network-on-Chip architectures, an in-house SystemC based simulation environment was implemented. It allows transaction level analysis without time consuming circuit level implementations during early design phases of novel architectures and features. The presented analysis shows that the dynamically clustered monitoring structure can be efficiently utilized for traffic management in faulty and congested Network-on-Chip-based multiprocessor systems. The monitoring structure can be also successfully applied for task mapping purposes. Furthermore, the analysis shows that the presented in-house simulation environment is flexible and practical tool for extensive Network-on-Chip architecture analysis.
Resumo:
-
Resumo:
Technological developments in microprocessors and ICT landscape have made a shift to a new era where computing power is embedded in numerous small distributed objects and devices in our everyday lives. These small computing devices are ne-tuned to perform a particular task and are increasingly reaching our society at every level. For example, home appliances such as programmable washing machines, microwave ovens etc., employ several sensors to improve performance and convenience. Similarly, cars have on-board computers that use information from many di erent sensors to control things such as fuel injectors, spark plug etc., to perform their tasks e ciently. These individual devices make life easy by helping in taking decisions and removing the burden from their users. All these objects and devices obtain some piece of information about the physical environment. Each of these devices is an island with no proper connectivity and information sharing between each other. Sharing of information between these heterogeneous devices could enable a whole new universe of innovative and intelligent applications. The information sharing between the devices is a diffcult task due to the heterogeneity and interoperability of devices. Smart Space vision is to overcome these issues of heterogeneity and interoperability so that the devices can understand each other and utilize services of each other by information sharing. This enables innovative local mashup applications based on shared data between heterogeneous devices. Smart homes are one such example of Smart Spaces which facilitate to bring the health care system to the patient, by intelligent interconnection of resources and their collective behavior, as opposed to bringing the patient into the health system. In addition, the use of mobile handheld devices has risen at a tremendous rate during the last few years and they have become an essential part of everyday life. Mobile phones o er a wide range of different services to their users including text and multimedia messages, Internet, audio, video, email applications and most recently TV services. The interactive TV provides a variety of applications for the viewers. The combination of interactive TV and the Smart Spaces could give innovative applications that are personalized, context-aware, ubiquitous and intelligent by enabling heterogeneous systems to collaborate each other by sharing information between them. There are many challenges in designing the frameworks and application development tools for rapid and easy development of these applications. The research work presented in this thesis addresses these issues. The original publications presented in the second part of this thesis propose architectures and methodologies for interactive and context-aware applications, and tools for the development of these applications. We demonstrated the suitability of our ontology-driven application development tools and rule basedapproach for the development of dynamic, context-aware ubiquitous iTV applications.
Resumo:
The question of the trainability of executive functions and the impact of such training on related cognitive skills has stirred considerable research interest. Despite a number of studies investigating this, the question has not yet been solved. The general aim of this thesis was to investigate two very different types of training of executive functions: laboratory-based computerized training (Studies I-III) and realworld training through bilingualism (Studies IV-V). Bilingualism as a kind of training of executive functions is based on the idea that managing two languages requires executive resources, and previous studies have suggested a bilingual advantage in executive functions. Three executive functions were studied in the present thesis: updating of working memory (WM) contents, inhibition of irrelevant information, and shifting between tasks and mental sets. Studies I-III investigated the effects of computer-based training of WM updating (Study I), inhibition (Study II), and set shifting (Study III) in healthy young adults. All studies showed increased performance on the trained task. More importantly, improvement on an untrained task tapping the trained executive function (near transfer) was seen in Study I and II. None of the three studies showed improvement on untrained tasks tapping some other cognitive function (far transfer) as a result of training. Study I also used PET to investigate the effects of WM updating training on a neurotransmitter closely linked to WM, namely dopamine. The PET results revealed increased striatal dopamine release during WM updating performance as a result of training. Study IV investigated the ability to inhibit task-irrelevant stimuli in bilinguals and monolinguals by using a dichotic listening task. The results showed that the bilinguals exceeded the monolinguals in inhibiting task-irrelevant information. Study V introduced a new, complementary research approach to study the bilingual executive advantage and its underlying mechanisms. To circumvent the methodological problems related to natural groups design, this approach focuses only on bilinguals and examines whether individual differences in bilingual behavior correlate with executive task performances. Using measures that tap the three above-entioned executive functions, the results suggested that more frequent language switching was associated with better set shifting skills, and earlier acquisition of the second language was related to better inhibition skills. In conclusion, the present behavioral results showed that computer-based training of executive functions can improve performance on the trained task and on closely related tasks, but does not yield a more general improvement of cognitive skills. Moreover, the functional neuroimaging results reveal that WM training modulates striatal dopaminergic function, speaking for training-induced neural plasticity in this important neurotransmitter system. With regard to bilingualism, the results provide further support to the idea that bilingualism can enhance executive functions. In addition, the new complementary research approach proposed here provides some clues as to which aspects of everyday bilingual behavior may be related to the advantage in executive functions in bilingual individuals.
Resumo:
Communication, the flow of ideas and information between individuals in a social context, is the heart of educational experience. Constructivism and constructivist theories form the foundation for the collaborative learning processes of creating and sharing meaning in online educational contexts. The Learning and Collaboration in Technology-enhanced Contexts (LeCoTec) course comprised of 66 participants drawn from four European universities (Oulu, Turku, Ghent and Ramon Llull). These participants were split into 15 groups with the express aim of learning about computer-supported collaborative learning (CSCL). The Community of Inquiry model (social, cognitive and teaching presences) provided the content and tools for learning and researching the collaborative interactions in this environment. The sampled comments from the collaborative phase were collected and analyzed at chain-level and group-level, with the aim of identifying the various message types that sustained high learning outcomes. Furthermore, the Social Network Analysis helped to view the density of whole group interactions, as well as the popular and active members within the highly collaborating groups. It was observed that long chains occur in groups having high quality outcomes. These chains were also characterized by Social, Interactivity, Administrative and Content comment-types. In addition, high outcomes were realized from the high interactive cases and high-density groups. In low interactive groups, commenting patterned around the one or two central group members. In conclusion, future online environments should support high-order learning and develop greater metacognition and self-regulation. Moreover, such an environment, with a wide variety of problem solving tools, would enhance interactivity.
Resumo:
The capabilities and thus, design complexity of VLSI-based embedded systems have increased tremendously in recent years, riding the wave of Moore’s law. The time-to-market requirements are also shrinking, imposing challenges to the designers, which in turn, seek to adopt new design methods to increase their productivity. As an answer to these new pressures, modern day systems have moved towards on-chip multiprocessing technologies. New architectures have emerged in on-chip multiprocessing in order to utilize the tremendous advances of fabrication technology. Platform-based design is a possible solution in addressing these challenges. The principle behind the approach is to separate the functionality of an application from the organization and communication architecture of hardware platform at several levels of abstraction. The existing design methodologies pertaining to platform-based design approach don’t provide full automation at every level of the design processes, and sometimes, the co-design of platform-based systems lead to sub-optimal systems. In addition, the design productivity gap in multiprocessor systems remain a key challenge due to existing design methodologies. This thesis addresses the aforementioned challenges and discusses the creation of a development framework for a platform-based system design, in the context of the SegBus platform - a distributed communication architecture. This research aims to provide automated procedures for platform design and application mapping. Structural verification support is also featured thus ensuring correct-by-design platforms. The solution is based on a model-based process. Both the platform and the application are modeled using the Unified Modeling Language. This thesis develops a Domain Specific Language to support platform modeling based on a corresponding UML profile. Object Constraint Language constraints are used to support structurally correct platform construction. An emulator is thus introduced to allow as much as possible accurate performance estimation of the solution, at high abstraction levels. VHDL code is automatically generated, in the form of “snippets” to be employed in the arbiter modules of the platform, as required by the application. The resulting framework is applied in building an actual design solution for an MP3 stereo audio decoder application.
Resumo:
This thesis concentrates on the validation of a generic thermal hydraulic computer code TRACE under the challenges of the VVER-440 reactor type. The code capability to model the VVER-440 geometry and thermal hydraulic phenomena specific to this reactor design has been examined and demonstrated acceptable. The main challenge in VVER-440 thermal hydraulics appeared in the modelling of the horizontal steam generator. The major challenge here is not in the code physics or numerics but in the formulation of a representative nodalization structure. Another VVER-440 specialty, the hot leg loop seals, challenges the system codes functionally in general, but proved readily representable. Computer code models have to be validated against experiments to achieve confidence in code models. When new computer code is to be used for nuclear power plant safety analysis, it must first be validated against a large variety of different experiments. The validation process has to cover both the code itself and the code input. Uncertainties of different nature are identified in the different phases of the validation procedure and can even be quantified. This thesis presents a novel approach to the input model validation and uncertainty evaluation in the different stages of the computer code validation procedure. This thesis also demonstrates that in the safety analysis, there are inevitably significant uncertainties that are not statistically quantifiable; they need to be and can be addressed by other, less simplistic means, ultimately relying on the competence of the analysts and the capability of the community to support the experimental verification of analytical assumptions. This method completes essentially the commonly used uncertainty assessment methods, which are usually conducted using only statistical methods.
Resumo:
This thesis presents a novel design paradigm, called Virtual Runtime Application Partitions (VRAP), to judiciously utilize the on-chip resources. As the dark silicon era approaches, where the power considerations will allow only a fraction chip to be powered on, judicious resource management will become a key consideration in future designs. Most of the works on resource management treat only the physical components (i.e. computation, communication, and memory blocks) as resources and manipulate the component to application mapping to optimize various parameters (e.g. energy efficiency). To further enhance the optimization potential, in addition to the physical resources we propose to manipulate abstract resources (i.e. voltage/frequency operating point, the fault-tolerance strength, the degree of parallelism, and the configuration architecture). The proposed framework (i.e. VRAP) encapsulates methods, algorithms, and hardware blocks to provide each application with the abstract resources tailored to its needs. To test the efficacy of this concept, we have developed three distinct self adaptive environments: (i) Private Operating Environment (POE), (ii) Private Reliability Environment (PRE), and (iii) Private Configuration Environment (PCE) that collectively ensure that each application meets its deadlines using minimal platform resources. In this work several novel architectural enhancements, algorithms and policies are presented to realize the virtual runtime application partitions efficiently. Considering the future design trends, we have chosen Coarse Grained Reconfigurable Architectures (CGRAs) and Network on Chips (NoCs) to test the feasibility of our approach. Specifically, we have chosen Dynamically Reconfigurable Resource Array (DRRA) and McNoC as the representative CGRA and NoC platforms. The proposed techniques are compared and evaluated using a variety of quantitative experiments. Synthesis and simulation results demonstrate VRAP significantly enhances the energy and power efficiency compared to state of the art.
Resumo:
Hur arbetar en framgångsrik programmerare? Uppgifterna att programmera datorspel och att programmera industriella, säkerhetskritiska system verkar tämligen olika. Genom en noggrann empirisk undersökning jämför och kontrasterar avhandlingen dessa två former av programmering och visar att programmering innefattar mer än teknisk förmåga. Med utgångspunkt i hermeneutisk och retorisk teori och med hjälp av både kulturvetenskap och datavetenskap visar avhandlingen att programmerarnas tradition och värderingar är grundläggande för deras arbete, och att båda sorter av programmering kan uppfattas och analyseras genom klassisk texttolkningstradition. Dessutom kan datorprogram betraktas och analyseras med hjälp av klassiska teorier om talproduktion i praktiken - program ses då i detta sammanhang som ett slags yttranden. Allt som allt förespråkar avhandlingen en återkomst till vetenskapens grunder, vilka innebär en ständig och oupphörlig cyklisk rörelse mellan att erfara och att förstå. Detta står i kontrast till en reduktionistisk syn på vetenskapen, som skiljer skarpt mellan subjektivt och objektivt, och på så sätt utgår från möjligheten att uppnå fullständigt vetande. Ofullständigt vetande är tolkandets och hermeneutikens domän. Syftet med avhandlingen är att med hjälp av exempel demonstrera programmeringens kulturella, hermeneutiska och retoriska natur.
Resumo:
The original contribution of this thesis to knowledge are novel digital readout architectures for hybrid pixel readout chips. The thesis presents asynchronous bus-based architecture, a data-node based column architecture and a network-based pixel matrix architecture for data transportation. It is shown that the data-node architecture achieves readout efficiency 99% with half the output rate as a bus-based system. The network-based solution avoids “broken” columns due to some manufacturing errors, and it distributes internal data traffic more evenly across the pixel matrix than column-based architectures. An improvement of > 10% to the efficiency is achieved with uniform and non-uniform hit occupancies. Architectural design has been done using transaction level modeling (TLM) and sequential high-level design techniques for reducing the design and simulation time. It has been possible to simulate tens of column and full chip architectures using the high-level techniques. A decrease of > 10 in run-time is observed using these techniques compared to register transfer level (RTL) design technique. Reduction of 50% for lines-of-code (LoC) for the high-level models compared to the RTL description has been achieved. Two architectures are then demonstrated in two hybrid pixel readout chips. The first chip, Timepix3 has been designed for the Medipix3 collaboration. According to the measurements, it consumes < 1 W/cm^2. It also delivers up to 40 Mhits/s/cm^2 with 10-bit time-over-threshold (ToT) and 18-bit time-of-arrival (ToA) of 1.5625 ns. The chip uses a token-arbitrated, asynchronous two-phase handshake column bus for internal data transfer. It has also been successfully used in a multi-chip particle tracking telescope. The second chip, VeloPix, is a readout chip being designed for the upgrade of Vertex Locator (VELO) of the LHCb experiment at CERN. Based on the simulations, it consumes < 1.5 W/cm^2 while delivering up to 320 Mpackets/s/cm^2, each packet containing up to 8 pixels. VeloPix uses a node-based data fabric for achieving throughput of 13.3 Mpackets/s from the column to the EoC. By combining Monte Carlo physics data with high-level simulations, it has been demonstrated that the architecture meets requirements of the VELO (260 Mpackets/s/cm^2 with efficiency of 99%).