951 resultados para thread programming
Resumo:
Teaching Operating Systems (OS) is a rather hard task, since being an OS designer is not a desired goal for most students and the subject demands a large amount of knowledge over system's details. To reduce the difficulty many courses are planned with laboratory practices, differing in how the practices are designed. Some try to implement next-to-real kernels, others use simulators, and even others use synthetic kernels. In this paper an approach based on synthetic kernels is described. It uses thread programming in order to establish control over the operating system components. T his approach allows the kernel to grow following the materials presented in the course. It has been successfully applied in two different courses at our University, the first one being a basic OS course and the second one an upper level course. Results from these applications are presented.
Resumo:
We present in this paper several contributions on the collision detection optimization centered on hardware performance. We focus on the broad phase which is the first step of the collision detection process and propose three new ways of parallelization of the well-known Sweep and Prune algorithm. We first developed a multi-core model takes into account the number of available cores. Multi-core architecture enables us to distribute geometric computations with use of multi-threading. Critical writing section and threads idling have been minimized by introducing new data structures for each thread. Programming with directives, like OpenMP, appears to be a good compromise for code portability. We then proposed a new GPU-based algorithm also based on the "Sweep and Prune" that has been adapted to multi-GPU architectures. Our technique is based on a spatial subdivision method used to distribute computations among GPUs. Results show that significant speed-up can be obtained by passing from 1 to 4 GPUs in a large-scale environment.
Resumo:
This paper presents the unique collection of additional features of Qu-Prolog, a variant of the Al programming language Prolog, and illustrates how they can be used for implementing DAI applications. By this we mean applications comprising communicating information servers, expert systems, or agents, with sophisticated reasoning capabilities and internal concurrency. Such an application exploits the key features of Qu-Prolog: support for the programming of sound non-clausal inference systems, multi-threading, and high level inter-thread message communication between Qu-Prolog query threads anywhere on the internet. The inter-thread communication uses email style symbolic names for threads, allowing easy construction of distributed applications using public names for threads. How threads react to received messages is specified by a disjunction of reaction rules which the thread periodically executes. A communications API allows smooth integration of components written in C, which to Qu-Prolog, look like remote query threads.
Resumo:
The process of developing software that takes advantage of multiple processors is commonly referred to as parallel programming. For various reasons, this process is much harder than the sequential case. For decades, parallel programming has been a problem for a small niche only: engineers working on parallelizing mostly numerical applications in High Performance Computing. This has changed with the advent of multi-core processors in mainstream computer architectures. Parallel programming in our days becomes a problem for a much larger group of developers. The main objective of this thesis was to find ways to make parallel programming easier for them. Different aims were identified in order to reach the objective: research the state of the art of parallel programming today, improve the education of software developers about the topic, and provide programmers with powerful abstractions to make their work easier. To reach these aims, several key steps were taken. To start with, a survey was conducted among parallel programmers to find out about the state of the art. More than 250 people participated, yielding results about the parallel programming systems and languages in use, as well as about common problems with these systems. Furthermore, a study was conducted in university classes on parallel programming. It resulted in a list of frequently made mistakes that were analyzed and used to create a programmers' checklist to avoid them in the future. For programmers' education, an online resource was setup to collect experiences and knowledge in the field of parallel programming - called the Parawiki. Another key step in this direction was the creation of the Thinking Parallel weblog, where more than 50.000 readers to date have read essays on the topic. For the third aim (powerful abstractions), it was decided to concentrate on one parallel programming system: OpenMP. Its ease of use and high level of abstraction were the most important reasons for this decision. Two different research directions were pursued. The first one resulted in a parallel library called AthenaMP. It contains so-called generic components, derived from design patterns for parallel programming. These include functionality to enhance the locks provided by OpenMP, to perform operations on large amounts of data (data-parallel programming), and to enable the implementation of irregular algorithms using task pools. AthenaMP itself serves a triple role: the components are well-documented and can be used directly in programs, it enables developers to study the source code and learn from it, and it is possible for compiler writers to use it as a testing ground for their OpenMP compilers. The second research direction was targeted at changing the OpenMP specification to make the system more powerful. The main contributions here were a proposal to enable thread-cancellation and a proposal to avoid busy waiting. Both were implemented in a research compiler, shown to be useful in example applications, and proposed to the OpenMP Language Committee.
Resumo:
The furious pace of Moore's Law is driving computer architecture into a realm where the the speed of light is the dominant factor in system latencies. The number of clock cycles to span a chip are increasing, while the number of bits that can be accessed within a clock cycle is decreasing. Hence, it is becoming more difficult to hide latency. One alternative solution is to reduce latency by migrating threads and data, but the overhead of existing implementations has previously made migration an unserviceable solution so far. I present an architecture, implementation, and mechanisms that reduces the overhead of migration to the point where migration is a viable supplement to other latency hiding mechanisms, such as multithreading. The architecture is abstract, and presents programmers with a simple, uniform fine-grained multithreaded parallel programming model with implicit memory management. In other words, the spatial nature and implementation details (such as the number of processors) of a parallel machine are entirely hidden from the programmer. Compiler writers are encouraged to devise programming languages for the machine that guide a programmer to express their ideas in terms of objects, since objects exhibit an inherent physical locality of data and code. The machine implementation can then leverage this locality to automatically distribute data and threads across the physical machine by using a set of high performance migration mechanisms. An implementation of this architecture could migrate a null thread in 66 cycles -- over a factor of 1000 improvement over previous work. Performance also scales well; the time required to move a typical thread is only 4 to 5 times that of a null thread. Data migration performance is similar, and scales linearly with data block size. Since the performance of the migration mechanism is on par with that of an L2 cache, the implementation simulated in my work has no data caches and relies instead on multithreading and the migration mechanism to hide and reduce access latencies.
Resumo:
Thesis (Ph.D.)--University of Washington, 2016-08
Resumo:
Processors with large numbers of cores are becoming commonplace. In order to utilise the available resources in such systems, the programming paradigm has to move towards increased parallelism. However, increased parallelism does not necessarily lead to better performance. Parallel programming models have to provide not only flexible ways of defining parallel tasks, but also efficient methods to manage the created tasks. Moreover, in a general-purpose system, applications residing in the system compete for the shared resources. Thread and task scheduling in such a multiprogrammed multithreaded environment is a significant challenge. In this thesis, we introduce a new task-based parallel reduction model, called the Glasgow Parallel Reduction Machine (GPRM). Our main objective is to provide high performance while maintaining ease of programming. GPRM supports native parallelism; it provides a modular way of expressing parallel tasks and the communication patterns between them. Compiling a GPRM program results in an Intermediate Representation (IR) containing useful information about tasks, their dependencies, as well as the initial mapping information. This compile-time information helps reduce the overhead of runtime task scheduling and is key to high performance. Generally speaking, the granularity and the number of tasks are major factors in achieving high performance. These factors are even more important in the case of GPRM, as it is highly dependent on tasks, rather than threads. We use three basic benchmarks to provide a detailed comparison of GPRM with Intel OpenMP, Cilk Plus, and Threading Building Blocks (TBB) on the Intel Xeon Phi, and with GNU OpenMP on the Tilera TILEPro64. GPRM shows superior performance in almost all cases, only by controlling the number of tasks. GPRM also provides a low-overhead mechanism, called “Global Sharing”, which improves performance in multiprogramming situations. We use OpenMP, as the most popular model for shared-memory parallel programming as the main GPRM competitor for solving three well-known problems on both platforms: LU factorisation of Sparse Matrices, Image Convolution, and Linked List Processing. We focus on proposing solutions that best fit into the GPRM’s model of execution. GPRM outperforms OpenMP in all cases on the TILEPro64. On the Xeon Phi, our solution for the LU Factorisation results in notable performance improvement for sparse matrices with large numbers of small blocks. We investigate the overhead of GPRM’s task creation and distribution for very short computations using the Image Convolution benchmark. We show that this overhead can be mitigated by combining smaller tasks into larger ones. As a result, GPRM can outperform OpenMP for convolving large 2D matrices on the Xeon Phi. Finally, we demonstrate that our parallel worksharing construct provides an efficient solution for Linked List processing and performs better than OpenMP implementations on the Xeon Phi. The results are very promising, as they verify that our parallel programming framework for manycore processors is flexible and scalable, and can provide high performance without sacrificing productivity.
Resumo:
This paper addresses the non-preemptive single machine scheduling problem to minimize total tardiness. We are interested in the online version of this problem, where orders arrive at the system at random times. Jobs have to be scheduled without knowledge of what jobs will come afterwards. The processing times and the due dates become known when the order is placed. The order release date occurs only at the beginning of periodic intervals. A customized approximate dynamic programming method is introduced for this problem. The authors also present numerical experiments that assess the reliability of the new approach and show that it performs better than a myopic policy.
Resumo:
The economic occupation of an area of 500 ha for Piracicaba was studied with the irrigated cultures of maize, tomato, sugarcane and beans, having used models of deterministic linear programming and linear programming including risk for the Target-Motad model, where two situations had been analyzed. In the deterministic model the area was the restrictive factor and the water was not restrictive for none of the tested situations. For the first situation the gotten maximum income was of R$ 1,883,372.87 and for the second situation it was of R$ 1,821,772.40. In the model including risk a producer that accepts risk can in the first situation get the maximum income of R$ 1,883,372. 87 with a minimum risk of R$ 350 year(-1), and in the second situation R$ 1,821,772.40 with a minimum risk of R$ 40 year(-1). Already a producer averse to the risk can get in the first situation a maximum income of R$ 1,775,974.81 with null risk and for the second situation R$ 1.707.706, 26 with null risk, both without water restriction. These results stand out the importance of the inclusion of the risk in supplying alternative occupations to the producer, allowing to a producer taking of decision considered the risk aversion and the pretension of income.
Resumo:
These notes follow on from the material that you studied in CSSE1000 Introduction to Computer Systems. There you studied details of logic gates, binary numbers and instruction set architectures using the Atmel AVR microcontroller family as an example. In your present course (METR2800 Team Project I), you need to get on to designing and building an application which will include such a microcontroller. These notes focus on programming an AVR microcontroller in C and provide a number of example programs to illustrate the use of some of the AVR peripheral devices.
Resumo:
Background. Age-related motor slowing may reflect either motor programming deficits, poorer movement execution, or mere strategic preferences for online guidance of movement. We controlled such preferences, limiting the extent to which movements could be programmed. Methods. Twenty-four young and 24 older adults performed a line drawing task that allowed movements to he prepared in advance in one case (i.e., cue initially available indicating target location) and not in another (i.e., no cue initially available as to target location). Participants connected large or small targets illuminated by light-emitting diodes upon a graphics tablet that sampled pen tip position at 200 Hz. Results. Older adults had a disproportionate difficulty initiating movement when prevented from programming in advance. Older adults produced slower, less efficient movements, particularly when prevented from programming under greater precision requirements. Conclusions. The slower movements of older adults do not simply reflect a preference for online control, as older adults have less efficient movements when forced to reprogram their movements. Age-related motor slowing kinematically resembles that seen in patients with cerebellar dysfunction.
Resumo:
1. Establishing biological control agents in the field is a major step in any classical biocontrol programme, yet there are few general guidelines to help the practitioner decide what factors might enhance the establishment of such agents. 2. A stochastic dynamic programming (SDP) approach, linked to a metapopulation model, was used to find optimal release strategies (number and size of releases), given constraints on time and the number of biocontrol agents available. By modelling within a decision-making framework we derived rules of thumb that will enable biocontrol workers to choose between management options, depending on the current state of the system. 3. When there are few well-established sites, making a few large releases is the optimal strategy. For other states of the system, the optimal strategy ranges from a few large releases, through a mixed strategy (a variety of release sizes), to many small releases, as the probability of establishment of smaller inocula increases. 4. Given that the probability of establishment is rarely a known entity, we also strongly recommend a mixed strategy in the early stages of a release programme, to accelerate learning and improve the chances of finding the optimal approach.
Resumo:
1. A model of the population dynamics of Banksia ornata was developed, using stochastic dynamic programming (a state-dependent decision-making tool), to determine optimal fire management strategies that incorporate trade-offs between biodiversity conservation and fuel reduction. 2. The modelled population of B. ornata was described by its age and density, and was exposed to the risk of unplanned fires and stochastic variation in germination success. 3. For a given population in each year, three management strategies were considered: (i) lighting a prescribed fire; (ii) controlling the incidence of unplanned fire; (iii) doing nothing. 4. The optimal management strategy depended on the state of the B. ornata population, with the time since the last fire (age of the population) being the most important variable. Lighting a prescribed fire at an age of less than 30 years was only optimal when the density of seedlings after a fire was low (< 100 plants ha(-1)) or when there were benefits of maintaining a low fuel load by using more frequent fire. 5. Because the cost of management was assumed to be negligible (relative to the value of the persistence of the population), the do-nothing option was never the optimal strategy, although lighting prescribed fires had only marginal benefits when the mean interval between unplanned fires was less than 20-30 years.