803 resultados para Multiprocessor computer architectures
Resumo:
Thesis (M. S.)--University of Illinois at Urbana-Champaign.
Resumo:
"This work was submitted in partial fulfillment of the requirements for the degree of Master of Science in Electrical Engineering, June 1967, and was supported in part by the AEC under Contract No. USAEC AT(11-1)1018.
Desenvolvimento de uma arquitetura reconfigurável para o processamento de modelos no ambiente ABACUS
Resumo:
Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)
Resumo:
With the latest advances in the area of advanced computer architectures we are seeing already large scale machines at petascale level and we are discussing exascale computing. All these require efficient scalable algorithms in order to bridge the performance gap. In this paper examples of various approaches of designing scalable algorithms for such advanced architectures will be given and the corresponding properties of these algorithms will be outlined and discussed. Examples will outline such scalable algorithms applied to large scale problems in the area Computational Biology, Environmental Modelling etc. The key properties of such advanced and scalable algorithms will be outlined.
Resumo:
We present a method for measuring single spins embedded in a solid by probing two-electron systems with a single-electron transistor (SET). Restrictions imposed by the Pauli principle on allowed two-electron states mean that the spin state of such systems has a profound impact on the orbital states (positions) of the electrons, a parameter which SET's are extremely well suited to measure. We focus on a particular system capable of being fabricated with current technology: a Te double donor in Si adjacent to a Si/SiO2, interface and lying directly beneath the SET island electrode, and we outline a measurement strategy capable of resolving single-electron and nuclear spins in this system. We discuss the limitations of the measurement imposed by spin scattering arising from fluctuations emanating from the SET and from lattice phonons. We conclude that measurement of single spins, a necessary requirement for several proposed quantum computer architectures, is feasible in Si using this strategy.
Resumo:
We theoretically study the Hilbert space structure of two neighboring P-donor electrons in silicon-based quantum computer architectures. To use electron spins as qubits, a crucial condition is the isolation of the electron spins from their environment, including the electronic orbital degrees of freedom. We provide detailed electronic structure calculations of both the single donor electron wave function and the two-electron pair wave function. We adopted a molecular orbital method for the two-electron problem, forming a basis with the calculated single donor electron orbitals. Our two-electron basis contains many singlet and triplet orbital excited states, in addition to the two simple ground state singlet and triplet orbitals usually used in the Heitler-London approximation to describe the two-electron donor pair wave function. We determined the excitation spectrum of the two-donor system, and study its dependence on strain, lattice position, and interdonor separation. This allows us to determine how isolated the ground state singlet and triplet orbitals are from the rest of the excited state Hilbert space. In addition to calculating the energy spectrum, we are also able to evaluate the exchange coupling between the two donor electrons, and the double occupancy probability that both electrons will reside on the same P donor. These two quantities are very important for logical operations in solid-state quantum computing devices, as a large exchange coupling achieves faster gating times, while the magnitude of the double occupancy probability can affect the error rate.
Resumo:
Over time, XML markup language has acquired a considerable importance in applications development, standards definition and in the representation of large volumes of data, such as databases. Today, processing XML documents in a short period of time is a critical activity in a large range of applications, which imposes choosing the most appropriate mechanism to parse XML documents quickly and efficiently. When using a programming language for XML processing, such as Java, it becomes necessary to use effective mechanisms, e.g. APIs, which allow reading and processing of large documents in appropriated manners. This paper presents a performance study of the main existing Java APIs that deal with XML documents, in order to identify the most suitable one for processing large XML files
Resumo:
With the advent of High performance computing, it is now possible to achieve orders of magnitude performance and computation e ciency gains over conventional computer architectures. This thesis explores the potential of using high performance computing to accelerate whole genome alignment. A parallel technique is applied to an algorithm for whole genome alignment, this technique is explained and some experiments were carried out to test it. This technique is based in a fair usage of the available resource to execute genome alignment and how this can be used in HPC clusters. This work is a rst approximation to whole genome alignment and it shows the advantages of parallelism and some of the drawbacks that our technique has. This work describes the resource limitations of current WGA applications when dealing with large quantities of sequences. It proposes a parallel heuristic to distribute the load and to assure that alignment quality is mantained.
Resumo:
With the shift towards many-core computer architectures, dataflow programming has been proposed as one potential solution for producing software that scales to a varying number of processor cores. Programming for parallel architectures is considered difficult as the current popular programming languages are inherently sequential and introducing parallelism is typically up to the programmer. Dataflow, however, is inherently parallel, describing an application as a directed graph, where nodes represent calculations and edges represent a data dependency in form of a queue. These queues are the only allowed communication between the nodes, making the dependencies between the nodes explicit and thereby also the parallelism. Once a node have the su cient inputs available, the node can, independently of any other node, perform calculations, consume inputs, and produce outputs. Data ow models have existed for several decades and have become popular for describing signal processing applications as the graph representation is a very natural representation within this eld. Digital lters are typically described with boxes and arrows also in textbooks. Data ow is also becoming more interesting in other domains, and in principle, any application working on an information stream ts the dataflow paradigm. Such applications are, among others, network protocols, cryptography, and multimedia applications. As an example, the MPEG group standardized a dataflow language called RVC-CAL to be use within reconfigurable video coding. Describing a video coder as a data ow network instead of with conventional programming languages, makes the coder more readable as it describes how the video dataflows through the different coding tools. While dataflow provides an intuitive representation for many applications, it also introduces some new problems that need to be solved in order for data ow to be more widely used. The explicit parallelism of a dataflow program is descriptive and enables an improved utilization of available processing units, however, the independent nodes also implies that some kind of scheduling is required. The need for efficient scheduling becomes even more evident when the number of nodes is larger than the number of processing units and several nodes are running concurrently on one processor core. There exist several data ow models of computation, with different trade-offs between expressiveness and analyzability. These vary from rather restricted but statically schedulable, with minimal scheduling overhead, to dynamic where each ring requires a ring rule to evaluated. The model used in this work, namely RVC-CAL, is a very expressive language, and in the general case it requires dynamic scheduling, however, the strong encapsulation of dataflow nodes enables analysis and the scheduling overhead can be reduced by using quasi-static, or piecewise static, scheduling techniques. The scheduling problem is concerned with nding the few scheduling decisions that must be run-time, while most decisions are pre-calculated. The result is then an, as small as possible, set of static schedules that are dynamically scheduled. To identify these dynamic decisions and to find the concrete schedules, this thesis shows how quasi-static scheduling can be represented as a model checking problem. This involves identifying the relevant information to generate a minimal but complete model to be used for model checking. The model must describe everything that may affect scheduling of the application while omitting everything else in order to avoid state space explosion. This kind of simplification is necessary to make the state space analysis feasible. For the model checker to nd the actual schedules, a set of scheduling strategies are de ned which are able to produce quasi-static schedulers for a wide range of applications. The results of this work show that actor composition with quasi-static scheduling can be used to transform data ow programs to t many different computer architecture with different type and number of cores. This in turn, enables dataflow to provide a more platform independent representation as one application can be fitted to a specific processor architecture without changing the actual program representation. Instead, the program representation is in the context of design space exploration optimized by the development tools to fit the target platform. This work focuses on representing the dataflow scheduling problem as a model checking problem and is implemented as part of a compiler infrastructure. The thesis also presents experimental results as evidence of the usefulness of the approach.
Resumo:
The process of developing software that takes advantage of multiple processors is commonly referred to as parallel programming. For various reasons, this process is much harder than the sequential case. For decades, parallel programming has been a problem for a small niche only: engineers working on parallelizing mostly numerical applications in High Performance Computing. This has changed with the advent of multi-core processors in mainstream computer architectures. Parallel programming in our days becomes a problem for a much larger group of developers. The main objective of this thesis was to find ways to make parallel programming easier for them. Different aims were identified in order to reach the objective: research the state of the art of parallel programming today, improve the education of software developers about the topic, and provide programmers with powerful abstractions to make their work easier. To reach these aims, several key steps were taken. To start with, a survey was conducted among parallel programmers to find out about the state of the art. More than 250 people participated, yielding results about the parallel programming systems and languages in use, as well as about common problems with these systems. Furthermore, a study was conducted in university classes on parallel programming. It resulted in a list of frequently made mistakes that were analyzed and used to create a programmers' checklist to avoid them in the future. For programmers' education, an online resource was setup to collect experiences and knowledge in the field of parallel programming - called the Parawiki. Another key step in this direction was the creation of the Thinking Parallel weblog, where more than 50.000 readers to date have read essays on the topic. For the third aim (powerful abstractions), it was decided to concentrate on one parallel programming system: OpenMP. Its ease of use and high level of abstraction were the most important reasons for this decision. Two different research directions were pursued. The first one resulted in a parallel library called AthenaMP. It contains so-called generic components, derived from design patterns for parallel programming. These include functionality to enhance the locks provided by OpenMP, to perform operations on large amounts of data (data-parallel programming), and to enable the implementation of irregular algorithms using task pools. AthenaMP itself serves a triple role: the components are well-documented and can be used directly in programs, it enables developers to study the source code and learn from it, and it is possible for compiler writers to use it as a testing ground for their OpenMP compilers. The second research direction was targeted at changing the OpenMP specification to make the system more powerful. The main contributions here were a proposal to enable thread-cancellation and a proposal to avoid busy waiting. Both were implemented in a research compiler, shown to be useful in example applications, and proposed to the OpenMP Language Committee.
Resumo:
With the development of high-level languages for new computer architectures comes the need for appropriate debugging tools as well. One method for meeting this need would be to develop, from scratch, a symbolic debugger with the introduction of each new language implementation for any given architecture. This, however, seems to require unnecessary duplication of effort among developers. This paper describes Maygen, a "debugger generation system," designed to efficiently provide the desired language-dependent and architecture-dependent debuggers. A prototype of the Maygen system has been implemented and is able to handle the semantically different languages of C and OPAL.
Resumo:
These notes have been issued on a small scale in 1983 and 1987 and on request at other times. This issue follows two items of news. First, WaIter Colquitt and Luther Welsh found the 'missed' Mersenne prime M110503 and advanced the frontier of complete Mp-testing to 139,267. In so doing, they terminated Slowinski's significant string of four consecutive Mersenne primes. Secondly, a team of five established a non-Mersenne number as the largest known prime. This result terminated the 1952-89 reign of Mersenne primes. All the original Mersenne numbers with p < 258 were factorised some time ago. The Sandia Laboratories team of Davis, Holdridge & Simmons with some little assistance from a CRAY machine cracked M211 in 1983 and M251 in 1984. They contributed their results to the 'Cunningham Project', care of Sam Wagstaff. That project is now moving apace thanks to developments in technology, factorisation and primality testing. New levels of computer power and new computer architectures motivated by the open-ended promise of parallelism are now available. Once again, the suppliers may be offering free buildings with the computer. However, the Sandia '84 CRAY-l implementation of the quadratic-sieve method is now outpowered by the number-field sieve technique. This is deployed on either purpose-built hardware or large syndicates, even distributed world-wide, of collaborating standard processors. New factorisation techniques of both special and general applicability have been defined and deployed. The elliptic-curve method finds large factors with helpful properties while the number-field sieve approach is breaking down composites with over one hundred digits. The material is updated on an occasional basis to follow the latest developments in primality-testing large Mp and factorising smaller Mp; all dates derive from the published literature or referenced private communications. Minor corrections, additions and changes merely advance the issue number after the decimal point. The reader is invited to report any errors and omissions that have escaped the proof-reading, to answer the unresolved questions noted and to suggest additional material associated with this subject.
Resumo:
[EN]A new parallel algorithm for simultaneous untangling and smoothing of tetrahedral meshes is proposed in this paper. We provide a detailed analysis of its performance on shared-memory many-core computer architectures. This performance analysis includes the evaluation of execution time, parallel scalability, load balancing, and parallelism bottlenecks. Additionally, we compare the impact of three previously published graph coloring procedures on the performance of our parallel algorithm. We use six benchmark meshes with a wide range of sizes. Using these experimental data sets, we describe the behavior of the parallel algorithm for different data sizes. We demonstrate that this algorithm is highly scalable when it runs on two different high-performance many-core computers with up to 128 processors...
Resumo:
This paper reports a model of the mammalian retina as well as an interpretation of some functions of the visual cortex. Its main objective is to simulate some of the behaviors observed at the different retina cells depending on the characteristics of the light impinging onto the photoreceptors. This simulation is carried out with a simple structure employed previously as basic building block of some optical computer architectures. Its possibility to perform any type of Boolean function allows a wide range of behaviors.
Resumo:
Motivated by applications to quantum computer architectures we study the change in the exchange interaction between neighbouring phosphorus donor electrons in silicon due to the application of voltage biases to surface control electrodes. These voltage biases create electro-static fields within the crystal substrate, perturbing the states of the donor electrons and thus altering the strength of the exchange interaction between them. We find that control gates of this kind can be used to either enhance or reduce the strength of the interaction, by an amount that depends both on the magnitude and orientation of the donor separation.