904 resultados para 291605 Processor Architectures


Relevância:

20.00% 20.00%

Publicador:

Resumo:

As technology geometries have shrunk to the deep submicron regime, the communication delay and power consumption of global interconnections in high performance Multi- Processor Systems-on-Chip (MPSoCs) are becoming a major bottleneck. The Network-on- Chip (NoC) architecture paradigm, based on a modular packet-switched mechanism, can address many of the on-chip communication issues such as performance limitations of long interconnects and integration of large number of Processing Elements (PEs) on a chip. The choice of routing protocol and NoC structure can have a significant impact on performance and power consumption in on-chip networks. In addition, building a high performance, area and energy efficient on-chip network for multicore architectures requires a novel on-chip router allowing a larger network to be integrated on a single die with reduced power consumption. On top of that, network interfaces are employed to decouple computation resources from communication resources, to provide the synchronization between them, and to achieve backward compatibility with existing IP cores. Three adaptive routing algorithms are presented as a part of this thesis. The first presented routing protocol is a congestion-aware adaptive routing algorithm for 2D mesh NoCs which does not support multicast (one-to-many) traffic while the other two protocols are adaptive routing models supporting both unicast (one-to-one) and multicast traffic. A streamlined on-chip router architecture is also presented for avoiding congested areas in 2D mesh NoCs via employing efficient input and output selection. The output selection utilizes an adaptive routing algorithm based on the congestion condition of neighboring routers while the input selection allows packets to be serviced from each input port according to its congestion level. Moreover, in order to increase memory parallelism and bring compatibility with existing IP cores in network-based multiprocessor architectures, adaptive network interface architectures are presented to use multiple SDRAMs which can be accessed simultaneously. In addition, a smart memory controller is integrated in the adaptive network interface to improve the memory utilization and reduce both memory and network latencies. Three Dimensional Integrated Circuits (3D ICs) have been emerging as a viable candidate to achieve better performance and package density as compared to traditional 2D ICs. In addition, combining the benefits of 3D IC and NoC schemes provides a significant performance gain for 3D architectures. In recent years, inter-layer communication across multiple stacked layers (vertical channel) has attracted a lot of interest. In this thesis, a novel adaptive pipeline bus structure is proposed for inter-layer communication to improve the performance by reducing the delay and complexity of traditional bus arbitration. In addition, two mesh-based topologies for 3D architectures are also introduced to mitigate the inter-layer footprint and power dissipation on each layer with a small performance penalty.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Technological developments in microprocessors and ICT landscape have made a shift to a new era where computing power is embedded in numerous small distributed objects and devices in our everyday lives. These small computing devices are ne-tuned to perform a particular task and are increasingly reaching our society at every level. For example, home appliances such as programmable washing machines, microwave ovens etc., employ several sensors to improve performance and convenience. Similarly, cars have on-board computers that use information from many di erent sensors to control things such as fuel injectors, spark plug etc., to perform their tasks e ciently. These individual devices make life easy by helping in taking decisions and removing the burden from their users. All these objects and devices obtain some piece of information about the physical environment. Each of these devices is an island with no proper connectivity and information sharing between each other. Sharing of information between these heterogeneous devices could enable a whole new universe of innovative and intelligent applications. The information sharing between the devices is a diffcult task due to the heterogeneity and interoperability of devices. Smart Space vision is to overcome these issues of heterogeneity and interoperability so that the devices can understand each other and utilize services of each other by information sharing. This enables innovative local mashup applications based on shared data between heterogeneous devices. Smart homes are one such example of Smart Spaces which facilitate to bring the health care system to the patient, by intelligent interconnection of resources and their collective behavior, as opposed to bringing the patient into the health system. In addition, the use of mobile handheld devices has risen at a tremendous rate during the last few years and they have become an essential part of everyday life. Mobile phones o er a wide range of different services to their users including text and multimedia messages, Internet, audio, video, email applications and most recently TV services. The interactive TV provides a variety of applications for the viewers. The combination of interactive TV and the Smart Spaces could give innovative applications that are personalized, context-aware, ubiquitous and intelligent by enabling heterogeneous systems to collaborate each other by sharing information between them. There are many challenges in designing the frameworks and application development tools for rapid and easy development of these applications. The research work presented in this thesis addresses these issues. The original publications presented in the second part of this thesis propose architectures and methodologies for interactive and context-aware applications, and tools for the development of these applications. We demonstrated the suitability of our ontology-driven application development tools and rule basedapproach for the development of dynamic, context-aware ubiquitous iTV applications.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Today's networked systems are becoming increasingly complex and diverse. The current simulation and runtime verification techniques do not provide support for developing such systems efficiently; moreover, the reliability of the simulated/verified systems is not thoroughly ensured. To address these challenges, the use of formal techniques to reason about network system development is growing, while at the same time, the mathematical background necessary for using formal techniques is a barrier for network designers to efficiently employ them. Thus, these techniques are not vastly used for developing networked systems. The objective of this thesis is to propose formal approaches for the development of reliable networked systems, by taking efficiency into account. With respect to reliability, we propose the architectural development of correct-by-construction networked system models. With respect to efficiency, we propose reusable network architectures as well as network development. At the core of our development methodology, we employ the abstraction and refinement techniques for the development and analysis of networked systems. We evaluate our proposal by employing the proposed architectures to a pervasive class of dynamic networks, i.e., wireless sensor network architectures as well as to a pervasive class of static networks, i.e., network-on-chip architectures. The ultimate goal of our research is to put forward the idea of building libraries of pre-proved rules for the efficient modelling, development, and analysis of networked systems. We take into account both qualitative and quantitative analysis of networks via varied formal tool support, using a theorem prover the Rodin platform and a statistical model checker the SMC-Uppaal.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Multiprocessor system-on-chip (MPSoC) designs utilize the available technology and communication architectures to meet the requirements of the upcoming applications. In MPSoC, the communication platform is both the key enabler, as well as the key differentiator for realizing efficient MPSoCs. It provides product differentiation to meet a diverse, multi-dimensional set of design constraints, including performance, power, energy, reconfigurability, scalability, cost, reliability and time-to-market. The communication resources of a single interconnection platform cannot be fully utilized by all kind of applications, such as the availability of higher communication bandwidth for computation but not data intensive applications is often unfeasible in the practical implementation. This thesis aims to perform the architecture-level design space exploration towards efficient and scalable resource utilization for MPSoC communication architecture. In order to meet the performance requirements within the design constraints, careful selection of MPSoC communication platform, resource aware partitioning and mapping of the application play important role. To enhance the utilization of communication resources, variety of techniques such as resource sharing, multicast to avoid re-transmission of identical data, and adaptive routing can be used. For implementation, these techniques should be customized according to the platform architecture. To address the resource utilization of MPSoC communication platforms, variety of architectures with different design parameters and performance levels, namely Segmented bus (SegBus), Network-on-Chip (NoC) and Three-Dimensional NoC (3D-NoC), are selected. Average packet latency and power consumption are the evaluation parameters for the proposed techniques. In conventional computing architectures, fault on a component makes the connected fault-free components inoperative. Resource sharing approach can utilize the fault-free components to retain the system performance by reducing the impact of faults. Design space exploration also guides to narrow down the selection of MPSoC architecture, which can meet the performance requirements with design constraints.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

With the shift towards many-core computer architectures, dataflow programming has been proposed as one potential solution for producing software that scales to a varying number of processor cores. Programming for parallel architectures is considered difficult as the current popular programming languages are inherently sequential and introducing parallelism is typically up to the programmer. Dataflow, however, is inherently parallel, describing an application as a directed graph, where nodes represent calculations and edges represent a data dependency in form of a queue. These queues are the only allowed communication between the nodes, making the dependencies between the nodes explicit and thereby also the parallelism. Once a node have the su cient inputs available, the node can, independently of any other node, perform calculations, consume inputs, and produce outputs. Data ow models have existed for several decades and have become popular for describing signal processing applications as the graph representation is a very natural representation within this eld. Digital lters are typically described with boxes and arrows also in textbooks. Data ow is also becoming more interesting in other domains, and in principle, any application working on an information stream ts the dataflow paradigm. Such applications are, among others, network protocols, cryptography, and multimedia applications. As an example, the MPEG group standardized a dataflow language called RVC-CAL to be use within reconfigurable video coding. Describing a video coder as a data ow network instead of with conventional programming languages, makes the coder more readable as it describes how the video dataflows through the different coding tools. While dataflow provides an intuitive representation for many applications, it also introduces some new problems that need to be solved in order for data ow to be more widely used. The explicit parallelism of a dataflow program is descriptive and enables an improved utilization of available processing units, however, the independent nodes also implies that some kind of scheduling is required. The need for efficient scheduling becomes even more evident when the number of nodes is larger than the number of processing units and several nodes are running concurrently on one processor core. There exist several data ow models of computation, with different trade-offs between expressiveness and analyzability. These vary from rather restricted but statically schedulable, with minimal scheduling overhead, to dynamic where each ring requires a ring rule to evaluated. The model used in this work, namely RVC-CAL, is a very expressive language, and in the general case it requires dynamic scheduling, however, the strong encapsulation of dataflow nodes enables analysis and the scheduling overhead can be reduced by using quasi-static, or piecewise static, scheduling techniques. The scheduling problem is concerned with nding the few scheduling decisions that must be run-time, while most decisions are pre-calculated. The result is then an, as small as possible, set of static schedules that are dynamically scheduled. To identify these dynamic decisions and to find the concrete schedules, this thesis shows how quasi-static scheduling can be represented as a model checking problem. This involves identifying the relevant information to generate a minimal but complete model to be used for model checking. The model must describe everything that may affect scheduling of the application while omitting everything else in order to avoid state space explosion. This kind of simplification is necessary to make the state space analysis feasible. For the model checker to nd the actual schedules, a set of scheduling strategies are de ned which are able to produce quasi-static schedulers for a wide range of applications. The results of this work show that actor composition with quasi-static scheduling can be used to transform data ow programs to t many different computer architecture with different type and number of cores. This in turn, enables dataflow to provide a more platform independent representation as one application can be fitted to a specific processor architecture without changing the actual program representation. Instead, the program representation is in the context of design space exploration optimized by the development tools to fit the target platform. This work focuses on representing the dataflow scheduling problem as a model checking problem and is implemented as part of a compiler infrastructure. The thesis also presents experimental results as evidence of the usefulness of the approach.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This thesis presents a novel design paradigm, called Virtual Runtime Application Partitions (VRAP), to judiciously utilize the on-chip resources. As the dark silicon era approaches, where the power considerations will allow only a fraction chip to be powered on, judicious resource management will become a key consideration in future designs. Most of the works on resource management treat only the physical components (i.e. computation, communication, and memory blocks) as resources and manipulate the component to application mapping to optimize various parameters (e.g. energy efficiency). To further enhance the optimization potential, in addition to the physical resources we propose to manipulate abstract resources (i.e. voltage/frequency operating point, the fault-tolerance strength, the degree of parallelism, and the configuration architecture). The proposed framework (i.e. VRAP) encapsulates methods, algorithms, and hardware blocks to provide each application with the abstract resources tailored to its needs. To test the efficacy of this concept, we have developed three distinct self adaptive environments: (i) Private Operating Environment (POE), (ii) Private Reliability Environment (PRE), and (iii) Private Configuration Environment (PCE) that collectively ensure that each application meets its deadlines using minimal platform resources. In this work several novel architectural enhancements, algorithms and policies are presented to realize the virtual runtime application partitions efficiently. Considering the future design trends, we have chosen Coarse Grained Reconfigurable Architectures (CGRAs) and Network on Chips (NoCs) to test the feasibility of our approach. Specifically, we have chosen Dynamically Reconfigurable Resource Array (DRRA) and McNoC as the representative CGRA and NoC platforms. The proposed techniques are compared and evaluated using a variety of quantitative experiments. Synthesis and simulation results demonstrate VRAP significantly enhances the energy and power efficiency compared to state of the art.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The original contribution of this thesis to knowledge are novel digital readout architectures for hybrid pixel readout chips. The thesis presents asynchronous bus-based architecture, a data-node based column architecture and a network-based pixel matrix architecture for data transportation. It is shown that the data-node architecture achieves readout efficiency 99% with half the output rate as a bus-based system. The network-based solution avoids “broken” columns due to some manufacturing errors, and it distributes internal data traffic more evenly across the pixel matrix than column-based architectures. An improvement of > 10% to the efficiency is achieved with uniform and non-uniform hit occupancies. Architectural design has been done using transaction level modeling (TLM) and sequential high-level design techniques for reducing the design and simulation time. It has been possible to simulate tens of column and full chip architectures using the high-level techniques. A decrease of > 10 in run-time is observed using these techniques compared to register transfer level (RTL) design technique. Reduction of 50% for lines-of-code (LoC) for the high-level models compared to the RTL description has been achieved. Two architectures are then demonstrated in two hybrid pixel readout chips. The first chip, Timepix3 has been designed for the Medipix3 collaboration. According to the measurements, it consumes < 1 W/cm^2. It also delivers up to 40 Mhits/s/cm^2 with 10-bit time-over-threshold (ToT) and 18-bit time-of-arrival (ToA) of 1.5625 ns. The chip uses a token-arbitrated, asynchronous two-phase handshake column bus for internal data transfer. It has also been successfully used in a multi-chip particle tracking telescope. The second chip, VeloPix, is a readout chip being designed for the upgrade of Vertex Locator (VELO) of the LHCb experiment at CERN. Based on the simulations, it consumes < 1.5 W/cm^2 while delivering up to 320 Mpackets/s/cm^2, each packet containing up to 8 pixels. VeloPix uses a node-based data fabric for achieving throughput of 13.3 Mpackets/s from the column to the EoC. By combining Monte Carlo physics data with high-level simulations, it has been demonstrated that the architecture meets requirements of the VELO (260 Mpackets/s/cm^2 with efficiency of 99%).

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This thesis describes research in which genetic programming is used to automatically evolve shape grammars that construct three dimensional models of possible external building architectures. A completely automated fitness function is used, which evaluates the three dimensional building models according to different geometric properties such as surface normals, height, building footprint, and more. In order to evaluate the buildings on the different criteria, a multi-objective fitness function is used. The results obtained from the automated system were successful in satisfying the multiple objective criteria as well as creating interesting and unique designs that a human-aided system might not discover. In this study of evolutionary design, the architectures created are not meant to be fully functional and structurally sound blueprints for constructing a building, but are meant to be inspirational ideas for possible architectural designs. The evolved models are applicable for today's architectural industries as well as in the video game and movie industries. Many new avenues for future work have also been discovered and highlighted.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Les tâches de vision artificielle telles que la reconnaissance d’objets demeurent irrésolues à ce jour. Les algorithmes d’apprentissage tels que les Réseaux de Neurones Artificiels (RNA), représentent une approche prometteuse permettant d’apprendre des caractéristiques utiles pour ces tâches. Ce processus d’optimisation est néanmoins difficile. Les réseaux profonds à base de Machine de Boltzmann Restreintes (RBM) ont récemment été proposés afin de guider l’extraction de représentations intermédiaires, grâce à un algorithme d’apprentissage non-supervisé. Ce mémoire présente, par l’entremise de trois articles, des contributions à ce domaine de recherche. Le premier article traite de la RBM convolutionelle. L’usage de champs réceptifs locaux ainsi que le regroupement d’unités cachées en couches partageant les même paramètres, réduit considérablement le nombre de paramètres à apprendre et engendre des détecteurs de caractéristiques locaux et équivariant aux translations. Ceci mène à des modèles ayant une meilleure vraisemblance, comparativement aux RBMs entraînées sur des segments d’images. Le deuxième article est motivé par des découvertes récentes en neurosciences. Il analyse l’impact d’unités quadratiques sur des tâches de classification visuelles, ainsi que celui d’une nouvelle fonction d’activation. Nous observons que les RNAs à base d’unités quadratiques utilisant la fonction softsign, donnent de meilleures performances de généralisation. Le dernière article quand à lui, offre une vision critique des algorithmes populaires d’entraînement de RBMs. Nous montrons que l’algorithme de Divergence Contrastive (CD) et la CD Persistente ne sont pas robustes : tous deux nécessitent une surface d’énergie relativement plate afin que leur chaîne négative puisse mixer. La PCD à "poids rapides" contourne ce problème en perturbant légèrement le modèle, cependant, ceci génère des échantillons bruités. L’usage de chaînes tempérées dans la phase négative est une façon robuste d’adresser ces problèmes et mène à de meilleurs modèles génératifs.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Cette thèse porte sur une classe d'algorithmes d'apprentissage appelés architectures profondes. Il existe des résultats qui indiquent que les représentations peu profondes et locales ne sont pas suffisantes pour la modélisation des fonctions comportant plusieurs facteurs de variation. Nous sommes particulièrement intéressés par ce genre de données car nous espérons qu'un agent intelligent sera en mesure d'apprendre à les modéliser automatiquement; l'hypothèse est que les architectures profondes sont mieux adaptées pour les modéliser. Les travaux de Hinton (2006) furent une véritable percée, car l'idée d'utiliser un algorithme d'apprentissage non-supervisé, les machines de Boltzmann restreintes, pour l'initialisation des poids d'un réseau de neurones supervisé a été cruciale pour entraîner l'architecture profonde la plus populaire, soit les réseaux de neurones artificiels avec des poids totalement connectés. Cette idée a été reprise et reproduite avec succès dans plusieurs contextes et avec une variété de modèles. Dans le cadre de cette thèse, nous considérons les architectures profondes comme des biais inductifs. Ces biais sont représentés non seulement par les modèles eux-mêmes, mais aussi par les méthodes d'entraînement qui sont souvent utilisés en conjonction avec ceux-ci. Nous désirons définir les raisons pour lesquelles cette classe de fonctions généralise bien, les situations auxquelles ces fonctions pourront être appliquées, ainsi que les descriptions qualitatives de telles fonctions. L'objectif de cette thèse est d'obtenir une meilleure compréhension du succès des architectures profondes. Dans le premier article, nous testons la concordance entre nos intuitions---que les réseaux profonds sont nécessaires pour mieux apprendre avec des données comportant plusieurs facteurs de variation---et les résultats empiriques. Le second article est une étude approfondie de la question: pourquoi l'apprentissage non-supervisé aide à mieux généraliser dans un réseau profond? Nous explorons et évaluons plusieurs hypothèses tentant d'élucider le fonctionnement de ces modèles. Finalement, le troisième article cherche à définir de façon qualitative les fonctions modélisées par un réseau profond. Ces visualisations facilitent l'interprétation des représentations et invariances modélisées par une architecture profonde.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Mémoire numérisé par la Division de la gestion de documents et des archives de l'Université de Montréal

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Mémoire numérisé par la Division de la gestion de documents et des archives de l'Université de Montréal

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Mémoire numérisé par la Division de la gestion de documents et des archives de l'Université de Montréal