966 resultados para memory access complexity


Relevância:

80.00% 80.00%

Publicador:

Resumo:

In vielen Bereichen der industriellen Fertigung, wie zum Beispiel in der Automobilindustrie, wer- den digitale Versuchsmodelle (sog. digital mock-ups) eingesetzt, um die Entwicklung komplexer Maschinen m ̈oglichst gut durch Computersysteme unterstu ̈tzen zu k ̈onnen. Hierbei spielen Be- wegungsplanungsalgorithmen eine wichtige Rolle, um zu gew ̈ahrleisten, dass diese digitalen Pro- totypen auch kollisionsfrei zusammengesetzt werden k ̈onnen. In den letzten Jahrzehnten haben sich hier sampling-basierte Verfahren besonders bew ̈ahrt. Diese erzeugen eine große Anzahl von zuf ̈alligen Lagen fu ̈r das ein-/auszubauende Objekt und verwenden einen Kollisionserken- nungsmechanismus, um die einzelnen Lagen auf Gu ̈ltigkeit zu u ̈berpru ̈fen. Daher spielt die Kollisionserkennung eine wesentliche Rolle beim Design effizienter Bewegungsplanungsalgorith- men. Eine Schwierigkeit fu ̈r diese Klasse von Planern stellen sogenannte “narrow passages” dar, schmale Passagen also, die immer dort auftreten, wo die Bewegungsfreiheit der zu planenden Objekte stark eingeschr ̈ankt ist. An solchen Stellen kann es schwierig sein, eine ausreichende Anzahl von kollisionsfreien Samples zu finden. Es ist dann m ̈oglicherweise n ̈otig, ausgeklu ̈geltere Techniken einzusetzen, um eine gute Performance der Algorithmen zu erreichen.rnDie vorliegende Arbeit gliedert sich in zwei Teile: Im ersten Teil untersuchen wir parallele Kollisionserkennungsalgorithmen. Da wir auf eine Anwendung bei sampling-basierten Bewe- gungsplanern abzielen, w ̈ahlen wir hier eine Problemstellung, bei der wir stets die selben zwei Objekte, aber in einer großen Anzahl von unterschiedlichen Lagen auf Kollision testen. Wir im- plementieren und vergleichen verschiedene Verfahren, die auf Hu ̈llk ̈operhierarchien (BVHs) und hierarchische Grids als Beschleunigungsstrukturen zuru ̈ckgreifen. Alle beschriebenen Verfahren wurden auf mehreren CPU-Kernen parallelisiert. Daru ̈ber hinaus vergleichen wir verschiedene CUDA Kernels zur Durchfu ̈hrung BVH-basierter Kollisionstests auf der GPU. Neben einer un- terschiedlichen Verteilung der Arbeit auf die parallelen GPU Threads untersuchen wir hier die Auswirkung verschiedener Speicherzugriffsmuster auf die Performance der resultierenden Algo- rithmen. Weiter stellen wir eine Reihe von approximativen Kollisionstests vor, die auf den beschriebenen Verfahren basieren. Wenn eine geringere Genauigkeit der Tests tolerierbar ist, kann so eine weitere Verbesserung der Performance erzielt werden.rnIm zweiten Teil der Arbeit beschreiben wir einen von uns entworfenen parallelen, sampling- basierten Bewegungsplaner zur Behandlung hochkomplexer Probleme mit mehreren “narrow passages”. Das Verfahren arbeitet in zwei Phasen. Die grundlegende Idee ist hierbei, in der er- sten Planungsphase konzeptionell kleinere Fehler zuzulassen, um die Planungseffizienz zu erh ̈ohen und den resultierenden Pfad dann in einer zweiten Phase zu reparieren. Der hierzu in Phase I eingesetzte Planer basiert auf sogenannten Expansive Space Trees. Zus ̈atzlich haben wir den Planer mit einer Freidru ̈ckoperation ausgestattet, die es erlaubt, kleinere Kollisionen aufzul ̈osen und so die Effizienz in Bereichen mit eingeschr ̈ankter Bewegungsfreiheit zu erh ̈ohen. Optional erlaubt unsere Implementierung den Einsatz von approximativen Kollisionstests. Dies setzt die Genauigkeit der ersten Planungsphase weiter herab, fu ̈hrt aber auch zu einer weiteren Perfor- mancesteigerung. Die aus Phase I resultierenden Bewegungspfade sind dann unter Umst ̈anden nicht komplett kollisionsfrei. Um diese Pfade zu reparieren, haben wir einen neuartigen Pla- nungsalgorithmus entworfen, der lokal beschr ̈ankt auf eine kleine Umgebung um den bestehenden Pfad einen neuen, kollisionsfreien Bewegungspfad plant.rnWir haben den beschriebenen Algorithmus mit einer Klasse von neuen, schwierigen Metall- Puzzlen getestet, die zum Teil mehrere “narrow passages” aufweisen. Unseres Wissens nach ist eine Sammlung vergleichbar komplexer Benchmarks nicht ̈offentlich zug ̈anglich und wir fan- den auch keine Beschreibung von vergleichbar komplexen Benchmarks in der Motion-Planning Literatur.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We describe Janus, a massively parallel FPGA-based computer optimized for the simulation of spin glasses, theoretical models for the behavior of glassy materials. FPGAs (as compared to GPUs or many-core processors) provide a complementary approach to massively parallel computing. In particular, our model problem is formulated in terms of binary variables, and floating-point operations can be (almost) completely avoided. The FPGA architecture allows us to run many independent threads with almost no latencies in memory access, thus updating up to 1024 spins per cycle. We describe Janus in detail and we summarize the physics results obtained in four years of operation of this machine; we discuss two types of physics applications: long simulations on very large systems (which try to mimic and provide understanding about the experimental non equilibrium dynamics), and low-temperature equilibrium simulations using an artificial parallel tempering dynamics. The time scale of our non-equilibrium simulations spans eleven orders of magnitude (from picoseconds to a tenth of a second). On the other hand, our equilibrium simulations are unprecedented both because of the low temperatures reached and for the large systems that we have brought to equilibrium. A finite-time scaling ansatz emerges from the detailed comparison of the two sets of simulations. Janus has made it possible to perform spin glass simulations that would take several decades on more conventional architectures. The paper ends with an assessment of the potential of possible future versions of the Janus architecture, based on state-of-the-art technology.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Retrieving large amounts of information over wide area networks, including the Internet, is problematic due to issues arising from latency of response, lack of direct memory access to data serving resources, and fault tolerance. This paper describes a design pattern for solving the issues of handling results from queries that return large amounts of data. Typically these queries would be made by a client process across a wide area network (or Internet), with one or more middle-tiers, to a relational database residing on a remote server. The solution involves implementing a combination of data retrieval strategies, including the use of iterators for traversing data sets and providing an appropriate level of abstraction to the client, double-buffering of data subsets, multi-threaded data retrieval, and query slicing. This design has recently been implemented and incorporated into the framework of a commercial software product developed at Oracle Corporation.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The high performance computing community has traditionally focused uniquely on the reduction of execution time, though in the last years, the optimization of energy consumption has become a main issue. A reduction of energy usage without a degradation of performance requires the adoption of energy-efficient hardware platforms accompanied by the development of energy-aware algorithms and computational kernels. The solution of linear systems is a key operation for many scientific and engineering problems. Its relevance has motivated an important amount of work, and consequently, it is possible to find high performance solvers for a wide variety of hardware platforms. In this work, we aim to develop a high performance and energy-efficient linear system solver. In particular, we develop two solvers for a low-power CPU-GPU platform, the NVIDIA Jetson TK1. These solvers implement the Gauss-Huard algorithm yielding an efficient usage of the target hardware as well as an efficient memory access. The experimental evaluation shows that the novel proposal reports important savings in both time and energy-consumption when compared with the state-of-the-art solvers of the platform.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

Issued also as Thesis (Ph. D.) University of Chicago, 1908.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Distraction in the workplace is increasingly more common in the information age. Several tasks and sources of information compete for a worker's limited cognitive capacities in human-computer interaction (HCI). In some situations even very brief interruptions can have detrimental effects on memory. Nevertheless, in other situations where persons are continuously interrupted, virtually no interruption costs emerge. This dissertation attempts to reveal the mental conditions and causalities differentiating the two outcomes. The explanation, building on the theory of long-term working memory (LTWM; Ericsson and Kintsch, 1995), focuses on the active, skillful aspects of human cognition that enable the storage of task information beyond the temporary and unstable storage provided by short-term working memory (STWM). Its key postulate is called a retrieval structure an abstract, hierarchical knowledge representation built into long-term memory that can be utilized to encode, update, and retrieve products of cognitive processes carried out during skilled task performance. If certain criteria of practice and task processing are met, LTWM allows for the storage of large representations for long time periods, yet these representations can be accessed with the accuracy, reliability, and speed typical of STWM. The main thesis of the dissertation is that the ability to endure interruptions depends on the efficiency in which LTWM can be recruited for maintaing information. An observational study and a field experiment provide ecological evidence for this thesis. Mobile users were found to be able to carry out heavy interleaving and sequencing of tasks while interacting, and they exhibited several intricate time-sharing strategies to orchestrate interruptions in a way sensitive to both external and internal demands. Interruptions are inevitable, because they arise as natural consequences of the top-down and bottom-up control of multitasking. In this process the function of LTWM is to keep some representations ready for reactivation and others in a more passive state to prevent interference. The psychological reality of the main thesis received confirmatory evidence in a series of laboratory experiments. They indicate that after encoding into LTWM, task representations are safeguarded from interruptions, regardless of their intensity, complexity, or pacing. However, when LTWM cannot be deployed, the problems posed by interference in long-term memory and the limited capacity of the STWM surface. A major contribution of the dissertation is the analysis of when users must resort to poorer maintenance strategies, like temporal cues and STWM-based rehearsal. First, one experiment showed that task orientations can be associated with radically different patterns of retrieval cue encodings. Thus the nature of the processing of the interface determines which features will be available as retrieval cues and which must be maintained by other means. In another study it was demonstrated that if the speed of encoding into LTWM, a skill-dependent parameter, is slower than the processing speed allowed for by the task, interruption costs emerge. Contrary to the predictions of competing theories, these costs turned out to involve intrusions in addition to omissions. Finally, it was learned that in rapid visually oriented interaction, perceptual-procedural expectations guide task resumption, and neither STWM nor LTWM are utilized due to the fact that access is too slow. These findings imply a change in thinking about the design of interfaces. Several novel principles of design are presented, basing on the idea of supporting the deployment of LTWM in the main task.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

A half-duplex constrained non-orthogonal cooperative multiple access (NCMA) protocol suitable for transmission of information from N users to a single destination in a wireless fading channel is proposed. Transmission in this protocol comprises of a broadcast phase and a cooperation phase. In the broadcast phase, each user takes turn broadcasting its data to all other users and the destination in an orthogonal fashion in time. In the cooperation phase, each user transmits a linear function of what it received from all other users as well as its own data. In contrast to the orthogonal extension of cooperative relay protocols to the cooperative multiple access channels wherein at any point of time, only one user is considered as a source and all the other users behave as relays and do not transmit their own data, the NCMA protocol relaxes the orthogonality built into the protocols and hence allows for a more spectrally efficient usage of resources. Code design criteria for achieving full diversity of N in the NCMA protocol is derived using pair wise error probability (PEP) analysis and it is shown that this can be achieved with a minimum total time duration of 2N - 1 channel uses. Explicit construction of full diversity codes is then provided for arbitrary number of users. Since the Maximum Likelihood decoding complexity grows exponentially with the number of users, the notion of g-group decodable codes is introduced for our setup and a set of necesary and sufficient conditions is also obtained.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Polycrystalline strontium titanate (SrTiO3) films were prepared by a pulsed laser deposition technique on p-type silicon and platinum-coated silicon substrates. The films exhibited good structural and dielectric properties which were sensitive to the processing conditions. The small signal dielectric constant and dissipation factor at a frequency of 100 kHz were about 225 and 0.03 respectively. The capacitance-voltage (C-V) characteristics in metal-insulator-semiconductor structures exhibited anomalous frequency dispersion behavior and a hysteresis effect. The hysteresis in the C-V curve was found to be about 1 V and of a charge injection type. The density of interface states was about 1.79 x 10(12) cm(-2). The charge storage density was found to be 40 fC mu m(-2) at an applied electric field of 200 kV cm(-1). Studies on current-voltage characteristics indicated an ohmic nature at lower voltages and space charge conduction at higher voltages. The films also exhibited excellent time-dependent dielectric breakdown behavior.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Software transactional memory (STM) has been proposed as a promising programming paradigm for shared memory multi-threaded programs as an alternative to conventional lock based synchronization primitives. Typical STM implementations employ a conflict detection scheme, which works with uniform access granularity, tracking shared data accesses either at word/cache line or at object level. It is well known that a single fixed access tracking granularity cannot meet the conflicting goals of reducing false conflicts without impacting concurrency adversely. A fine grained granularity while improving concurrency can have an adverse impact on performance due to lock aliasing, lock validation overheads, and additional cache pressure. On the other hand, a coarse grained granularity can impact performance due to reduced concurrency. Thus, in general, a fixed or uniform granularity access tracking (UGAT) scheme is application-unaware and rarely matches the access patterns of individual application or parts of an application, leading to sub-optimal performance for different parts of the application(s). In order to mitigate the disadvantages associated with UGAT scheme, we propose a Variable Granularity Access Tracking (VGAT) scheme in this paper. We propose a compiler based approach wherein the compiler uses inter-procedural whole program static analysis to select the access tracking granularity for different shared data structures of the application based on the application's data access pattern. We describe our prototype VGAT scheme, using TL2 as our STM implementation. Our experimental results reveal that VGAT-STM scheme can improve the application performance of STAMP benchmarks from 1.87% to up to 21.2%.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Band alignment of resistive random access memory (RRAM) switching material Ta2O5 and different metal electrode materials was examined using high-resolution X-ray photoelectron spectroscopy. Schottky and hole barrier heights at the interface between electrode and Ta2O 5 were obtained, where the electrodes consist of materials with low to high work function (Φ m, v a c from 4.06 to 5.93 eV). Effective metal work functions were extracted to study the Fermi level pinning effect and to discuss the dominant conduction mechanism. An accurate band alignment between electrodes and Ta2O5 is obtained and can be used for RRAM electrode engineering and conduction mechanism study. © 2013 American Institute of Physics.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Submitted by 阎军 (yanj@red.semi.ac.cn) on 2010-06-07T01:33:41Z No. of bitstreams: 1 ApplPhysLett_96_213505.pdf: 1153920 bytes, checksum: 69931d8deb797813dd478b5dd0e292c0 (MD5)

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Influence diagrams are intuitive and concise representations of structured decision problems. When the problem is non-Markovian, an optimal strategy can be exponentially large in the size of the diagram. We can avoid the inherent intractability by constraining the size of admissible strategies, giving rise to limited memory influence diagrams. A valuable question is then how small do strategies need to be to enable efficient optimal planning. Arguably, the smallest strategies one can conceive simply prescribe an action for each time step, without considering past decisions or observations. Previous work has shown that finding such optimal strategies even for polytree-shaped diagrams with ternary variables and a single value node is NP-hard, but the case of binary variables was left open. In this paper we address such a case, by first noting that optimal strategies can be obtained in polynomial time for polytree-shaped diagrams with binary variables and a single value node. We then show that the same problem is NP-hard if the diagram has multiple value nodes. These two results close the fixed-parameter complexity analysis of optimal strategy selection in influence diagrams parametrized by the shape of the diagram, the number of value nodes and the maximum variable cardinality.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

A thermally activated photoluminescence memory effect, induced by a reversible order-disorder phase transition of the alkyl chains, is reported for highly organized bilayer alkyl/siloxane hybrids (see figure; left at room temperature, right at 120 degrees C). The emission energy is sensitive to the annihilation/formation of the hydrogen-bonded amide-amide array displaying a unique nanoscopic sensitivity (ca. 150 nm).