927 resultados para graphics processor
Resumo:
Die stereoskopische 3-D-Darstellung beruht auf der naturgetreuen Präsentation verschiedener Perspektiven für das rechte und linke Auge. Sie erlangt in der Medizin, der Architektur, im Design sowie bei Computerspielen und im Kino, zukünftig möglicherweise auch im Fernsehen, eine immer größere Bedeutung. 3-D-Displays dienen der zusätzlichen Wiedergabe der räumlichen Tiefe und lassen sich grob in die vier Gruppen Stereoskope und Head-mounted-Displays, Brillensysteme, autostereoskopische Displays sowie echte 3-D-Displays einteilen. Darunter besitzt der autostereoskopische Ansatz ohne Brillen, bei dem N≥2 Perspektiven genutzt werden, ein hohes Potenzial. Die beste Qualität in dieser Gruppe kann mit der Methode der Integral Photography, die sowohl horizontale als auch vertikale Parallaxe kodiert, erreicht werden. Allerdings ist das Verfahren sehr aufwendig und wird deshalb wenig genutzt. Den besten Kompromiss zwischen Leistung und Preis bieten präzise gefertigte Linsenrasterscheiben (LRS), die hinsichtlich Lichtausbeute und optischen Eigenschaften den bereits früher bekannten Barrieremasken überlegen sind. Insbesondere für die ergonomisch günstige Multiperspektiven-3-D-Darstellung wird eine hohe physikalische Monitorauflösung benötigt. Diese ist bei modernen TFT-Displays schon recht hoch. Eine weitere Verbesserung mit dem theoretischen Faktor drei erreicht man durch gezielte Ansteuerung der einzelnen, nebeneinander angeordneten Subpixel in den Farben Rot, Grün und Blau. Ermöglicht wird dies durch die um etwa eine Größenordnung geringere Farbauflösung des menschlichen visuellen Systems im Vergleich zur Helligkeitsauflösung. Somit gelingt die Implementierung einer Subpixel-Filterung, welche entsprechend den physiologischen Gegebenheiten mit dem in Luminanz und Chrominanz trennenden YUV-Farbmodell arbeitet. Weiterhin erweist sich eine Schrägstellung der Linsen im Verhältnis von 1:6 als günstig. Farbstörungen werden minimiert, und die Schärfe der Bilder wird durch eine weniger systematische Vergrößerung der technologisch unvermeidbaren Trennelemente zwischen den Subpixeln erhöht. Der Grad der Schrägstellung ist frei wählbar. In diesem Sinne ist die Filterung als adaptiv an den Neigungswinkel zu verstehen, obwohl dieser Wert für einen konkreten 3-D-Monitor eine Invariante darstellt. Die zu maximierende Zielgröße ist der Parameter Perspektiven-Pixel als Produkt aus Anzahl der Perspektiven N und der effektiven Auflösung pro Perspektive. Der Idealfall einer Verdreifachung wird praktisch nicht erreicht. Messungen mit Hilfe von Testbildern sowie Schrifterkennungstests lieferten einen Wert von knapp über 2. Dies ist trotzdem als eine signifikante Verbesserung der Qualität der 3-D-Darstellung anzusehen. In der Zukunft sind weitere Verbesserungen hinsichtlich der Zielgröße durch Nutzung neuer, feiner als TFT auflösender Technologien wie LCoS oder OLED zu erwarten. Eine Kombination mit der vorgeschlagenen Filtermethode wird natürlich weiterhin möglich und ggf. auch sinnvoll sein.
Resumo:
For the theoretical investigation of local phenomena (adsorption at surfaces, defects or impurities within a crystal, etc.) one can assume that the effects caused by the local disturbance are only limited to the neighbouring particles. With this model, that is well-known as cluster-approximation, an infinite system can be simulated by a much smaller segment of the surface (Cluster). The size of this segment varies strongly for different systems. Calculations to the convergence of bond distance and binding energy of an adsorbed aluminum atom on an Al(100)-surface showed that more than 100 atoms are necessary to get a sufficient description of surface properties. However with a full-quantummechanical approach these system sizes cannot be calculated because of the effort in computer memory and processor speed. Therefore we developed an embedding procedure for the simulation of surfaces and solids, where the whole system is partitioned in several parts which itsself are treated differently: the internal part (cluster), which is located near the place of the adsorbate, is calculated completely self-consistently and is embedded into an environment, whereas the influence of the environment on the cluster enters as an additional, external potential to the relativistic Kohn-Sham-equations. The basis of the procedure represents the density functional theory. However this means that the choice of the electronic density of the environment constitutes the quality of the embedding procedure. The environment density was modelled in three different ways: atomic densities; of a large prepended calculation without embedding transferred densities; bulk-densities (copied). The embedding procedure was tested on the atomic adsorptions of 'Al on Al(100) and Cu on Cu(100). The result was that if the environment is choices appropriately for the Al-system one needs only 9 embedded atoms to reproduce the results of exact slab-calculations. For the Cu-system first calculations without embedding procedures were accomplished, with the result that already 60 atoms are sufficient as a surface-cluster. Using the embedding procedure the same values with only 25 atoms were obtained. This means a substantial improvement if one takes into consideration that the calculation time increased cubically with the number of atoms. With the embedding method Infinite systems can be treated by molecular methods. Additionally the program code was extended by the possibility to make molecular-dynamic simulations. Now it is possible apart from the past calculations of fixed cores to investigate also structures of small clusters and surfaces. A first application we made with the adsorption of Cu on Cu(100). We calculated the relaxed positions of the atoms that were located close to the adsorption site and afterwards made the full-quantummechanical calculation of this system. We did that procedure for different distances to the surface. Thus a realistic adsorption process could be examined for the first time. It should be remarked that when doing the Cu reference-calculations (without embedding) we begun to parallelize the entire program code. Only because of this aspect the investigations for the 100 atomic Cu surface-clusters were possible. Due to the good efficiency of both the parallelization and the developed embedding procedure we will be able to apply the combination in future. This will help to work on more these areas it will be possible to bring in results of full-relativistic molecular calculations, what will be very interesting especially for the regime of heavy systems.
Resumo:
The centralised control rooms of large industrial plants have separated people from the processes they should control. Perception is restricted mainly to the visual sense. Only telephone or radio links provide narrow-band voice communication with maintenance personnel down in the plant. Multimedia equipment can perceptionally bring back the operator into the plant while bodily keeping him the comfortable and safe control room. This involves video and audio transmission from process components as well as sights and sounds artificially generated from measurements. Groupware systems support inter-action between operators, engineers, and managers in different plants. With support from the German government, the state of Hessen, and industrial companies the Laboratory for Systems Engineering and Human-Machine Systems at the University of Kassel establishes an Experimental Multimedia Process Control Room. Core of this set-up are two high-performance graphics workstations linked to one of several process or vehicle simulators. Multimedia periphery includes video and teleconferencing equipment and a vibration and sound generation system.
Resumo:
The Scheme86 and the HP Precision Architectures represent different trends in computer processor design. The former uses wide micro-instructions, parallel hardware, and a low latency memory interface. The latter encourages pipelined implementation and visible interlocks. To compare the merits of these approaches, algorithms frequently encountered in numerical and symbolic computation were hand-coded for each architecture. Timings were done in simulators and the results were evaluated to determine the speed of each design. Based on these measurements, conclusions were drawn as to which aspects of each architecture are suitable for a high- performance computer.
Resumo:
Scheduling tasks to efficiently use the available processor resources is crucial to minimizing the runtime of applications on shared-memory parallel processors. One factor that contributes to poor processor utilization is the idle time caused by long latency operations, such as remote memory references or processor synchronization operations. One way of tolerating this latency is to use a processor with multiple hardware contexts that can rapidly switch to executing another thread of computation whenever a long latency operation occurs, thus increasing processor utilization by overlapping computation with communication. Although multiple contexts are effective for tolerating latency, this effectiveness can be limited by memory and network bandwidth, by cache interference effects among the multiple contexts, and by critical tasks sharing processor resources with less critical tasks. This thesis presents techniques that increase the effectiveness of multiple contexts by intelligently scheduling threads to make more efficient use of processor pipeline, bandwidth, and cache resources. This thesis proposes thread prioritization as a fundamental mechanism for directing the thread schedule on a multiple-context processor. A priority is assigned to each thread either statically or dynamically and is used by the thread scheduler to decide which threads to load in the contexts, and to decide which context to switch to on a context switch. We develop a multiple-context model that integrates both cache and network effects, and shows how thread prioritization can both maintain high processor utilization, and limit increases in critical path runtime caused by multithreading. The model also shows that in order to be effective in bandwidth limited applications, thread prioritization must be extended to prioritize memory requests. We show how simple hardware can prioritize the running of threads in the multiple contexts, and the issuing of requests to both the local memory and the network. Simulation experiments show how thread prioritization is used in a variety of applications. Thread prioritization can improve the performance of synchronization primitives by minimizing the number of processor cycles wasted in spinning and devoting more cycles to critical threads. Thread prioritization can be used in combination with other techniques to improve cache performance and minimize cache interference between different working sets in the cache. For applications that are critical path limited, thread prioritization can improve performance by allowing processor resources to be devoted preferentially to critical threads. These experimental results show that thread prioritization is a mechanism that can be used to implement a wide range of scheduling policies.
Resumo:
As the number of processors in distributed-memory multiprocessors grows, efficiently supporting a shared-memory programming model becomes difficult. We have designed the Protocol for Hierarchical Directories (PHD) to allow shared-memory support for systems containing massive numbers of processors. PHD eliminates bandwidth problems by using a scalable network, decreases hot-spots by not relying on a single point to distribute blocks, and uses a scalable amount of space for its directories. PHD provides a shared-memory model by synthesizing a global shared memory from the local memories of processors. PHD supports sequentially consistent read, write, and test- and-set operations. This thesis also introduces a method of describing locality for hierarchical protocols and employs this method in the derivation of an abstract model of the protocol behavior. An embedded model, based on the work of Johnson[ISCA19], describes the protocol behavior when mapped to a k-ary n-cube. The thesis uses these two models to study the average height in the hierarchy that operations reach, the longest path messages travel, the number of messages that operations generate, the inter-transaction issue time, and the protocol overhead for different locality parameters, degrees of multithreading, and machine sizes. We determine that multithreading is only useful for approximately two to four threads; any additional interleaving does not decrease the overall latency. For small machines and high locality applications, this limitation is due mainly to the length of the running threads. For large machines with medium to low locality, this limitation is due mainly to the protocol overhead being too large. Our study using the embedded model shows that in situations where the run length between references to shared memory is at least an order of magnitude longer than the time to process a single state transition in the protocol, applications exhibit good performance. If separate controllers for processing protocol requests are included, the protocol scales to 32k processor machines as long as the application exhibits hierarchical locality: at least 22% of the global references must be able to be satisfied locally; at most 35% of the global references are allowed to reach the top level of the hierarchy.
Resumo:
Research on autonomous intelligent systems has focused on how robots can robustly carry out missions in uncertain and harsh environments with very little or no human intervention. Robotic execution languages such as RAPs, ESL, and TDL improve robustness by managing functionally redundant procedures for achieving goals. The model-based programming approach extends this by guaranteeing correctness of execution through pre-planning of non-deterministic timed threads of activities. Executing model-based programs effectively on distributed autonomous platforms requires distributing this pre-planning process. This thesis presents a distributed planner for modelbased programs whose planning and execution is distributed among agents with widely varying levels of processor power and memory resources. We make two key contributions. First, we reformulate a model-based program, which describes cooperative activities, into a hierarchical dynamic simple temporal network. This enables efficient distributed coordination of robots and supports deployment on heterogeneous robots. Second, we introduce a distributed temporal planner, called DTP, which solves hierarchical dynamic simple temporal networks with the assistance of the distributed Bellman-Ford shortest path algorithm. The implementation of DTP has been demonstrated successfully on a wide range of randomly generated examples and on a pursuer-evader challenge problem in simulation.
Resumo:
Image analysis and graphics synthesis can be achieved with learning techniques using directly image examples without physically-based, 3D models. In our technique: -- the mapping from novel images to a vector of "pose" and "expression" parameters can be learned from a small set of example images using a function approximation technique that we call an analysis network; -- the inverse mapping from input "pose" and "expression" parameters to output images can be synthesized from a small set of example images and used to produce new images using a similar synthesis network. The techniques described here have several applications in computer graphics, special effects, interactive multimedia and very low bandwidth teleconferencing.
Resumo:
Human object recognition is generally considered to tolerate changes of the stimulus position in the visual field. A number of recent studies, however, have cast doubt on the completeness of translation invariance. In a new series of experiments we tried to investigate whether positional specificity of short-term memory is a general property of visual perception. We tested same/different discrimination of computer graphics models that were displayed at the same or at different locations of the visual field, and found complete translation invariance, regardless of the similarity of the animals and irrespective of direction and size of the displacement (Exp. 1 and 2). Decisions were strongly biased towards same decisions if stimuli appeared at a constant location, while after translation subjects displayed a tendency towards different decisions. Even if the spatial order of animal limbs was randomized ("scrambled animals"), no deteriorating effect of shifts in the field of view could be detected (Exp. 3). However, if the influence of single features was reduced (Exp. 4 and 5) small but significant effects of translation could be obtained. Under conditions that do not reveal an influence of translation, rotation in depth strongly interferes with recognition (Exp. 6). Changes of stimulus size did not reduce performance (Exp. 7). Tolerance to these object transformations seems to rely on different brain mechanisms, with translation and scale invariance being achieved in principle, while rotation invariance is not.
Resumo:
We describe the key role played by partial evaluation in the Supercomputing Toolkit, a parallel computing system for scientific applications that effectively exploits the vast amount of parallelism exposed by partial evaluation. The Supercomputing Toolkit parallel processor and its associated partial evaluation-based compiler have been used extensively by scientists at MIT, and have made possible recent results in astrophysics showing that the motion of the planets in our solar system is chaotically unstable.
Resumo:
We describe the key role played by partial evaluation in the Supercomputing Toolkit, a parallel computing system for scientific applications that effectively exploits the vast amount of parallelism exposed by partial evaluation. The Supercomputing Toolkit parallel processor and its associated partial evaluation-based compiler have been used extensively by scientists at MIT, and have made possible recent results in astrophysics showing that the motion of the planets in our solar system is chaotically unstable.
Resumo:
We consider the often-studied problem of sorting, for a parallel computer. Given an input array distributed evenly over p processors, the task is to compute the sorted output array, also distributed over the p processors. Many existing algorithms take the approach of approximately load-balancing the output, leaving each processor with Θ(n/p) elements. However, in many cases, approximate load-balancing leads to inefficiencies in both the sorting itself and in further uses of the data after sorting. We provide a deterministic parallel sorting algorithm that uses parallel selection to produce any output distribution exactly, particularly one that is perfectly load-balanced. Furthermore, when using a comparison sort, this algorithm is 1-optimal in both computation and communication. We provide an empirical study that illustrates the efficiency of exact data splitting, and shows an improvement over two sample sort algorithms.
Resumo:
The memory hierarchy is the main bottleneck in modern computer systems as the gap between the speed of the processor and the memory continues to grow larger. The situation in embedded systems is even worse. The memory hierarchy consumes a large amount of chip area and energy, which are precious resources in embedded systems. Moreover, embedded systems have multiple design objectives such as performance, energy consumption, and area, etc. Customizing the memory hierarchy for specific applications is a very important way to take full advantage of limited resources to maximize the performance. However, the traditional custom memory hierarchy design methodologies are phase-ordered. They separate the application optimization from the memory hierarchy architecture design, which tend to result in local-optimal solutions. In traditional Hardware-Software co-design methodologies, much of the work has focused on utilizing reconfigurable logic to partition the computation. However, utilizing reconfigurable logic to perform the memory hierarchy design is seldom addressed. In this paper, we propose a new framework for designing memory hierarchy for embedded systems. The framework will take advantage of the flexible reconfigurable logic to customize the memory hierarchy for specific applications. It combines the application optimization and memory hierarchy design together to obtain a global-optimal solution. Using the framework, we performed a case study to design a new software-controlled instruction memory that showed promising potential.
Resumo:
A key capability of data-race detectors is to determine whether one thread executes logically in parallel with another or whether the threads must operate in series. This paper provides two algorithms, one serial and one parallel, to maintain series-parallel (SP) relationships "on the fly" for fork-join multithreaded programs. The serial SP-order algorithm runs in O(1) amortized time per operation. In contrast, the previously best algorithm requires a time per operation that is proportional to Tarjan’s functional inverse of Ackermann’s function. SP-order employs an order-maintenance data structure that allows us to implement a more efficient "English-Hebrew" labeling scheme than was used in earlier race detectors, which immediately yields an improved determinacy-race detector. In particular, any fork-join program running in T₁ time on a single processor can be checked on the fly for determinacy races in O(T₁) time. Corresponding improved bounds can also be obtained for more sophisticated data-race detectors, for example, those that use locks. By combining SP-order with Feng and Leiserson’s serial SP-bags algorithm, we obtain a parallel SP-maintenance algorithm, called SP-hybrid. Suppose that a fork-join program has n threads, T₁ work, and a critical-path length of T[subscript â]. When executed on P processors, we prove that SP-hybrid runs in O((T₁/P + PT[subscript â]) lg n) expected time. To understand this bound, consider that the original program obtains linear speed-up over a 1-processor execution when P = O(T₁/T[subscript â]). In contrast, SP-hybrid obtains linear speed-up when P = O(√T₁/T[subscript â]), but the work is increased by a factor of O(lg n).
Resumo:
Examples of compositional data. The simplex, a suitable sample space for compositional data and Aitchison's geometry. R, a free language and environment for statistical computing and graphics