598 resultados para processor


Relevância:

10.00% 10.00%

Publicador:

Resumo:

A parallel processor architecture based on a communicating sequential processor chip, the transputer, is described. The architecture is easily linearly extensible to enable separate functions to be included in the controller. To demonstrate the power of the resulting controller some experimental results are presented comparing PID and full inverse dynamics on the first three joints of a Puma 560 robot. Also examined are some of the sample rate issues raised by the asynchronous updating of inertial parameters, and the need for full inverse dynamics at every sample interval is questioned.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The Normal Quantile Transform (NQT) has been used in many hydrological and meteorological applications in order to make the Cumulated Distribution Function (CDF) of the observed, simulated and forecast river discharge, water level or precipitation data Gaussian. It is also the heart of the meta-Gaussian model for assessing the total predictive uncertainty of the Hydrological Uncertainty Processor (HUP) developed by Krzysztofowicz. In the field of geo-statistics this transformation is better known as the Normal-Score Transform. In this paper some possible problems caused by small sample sizes when applying the NQT in flood forecasting systems will be discussed and a novel way to solve the problem will be outlined by combining extreme value analysis and non-parametric regression methods. The method will be illustrated by examples of hydrological stream-flow forecasts.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Many operational weather forecasting centres use semi-implicit time-stepping schemes because of their good efficiency. However, as computers become ever more parallel, horizontally explicit solutions of the equations of atmospheric motion might become an attractive alternative due to the additional inter-processor communication of implicit methods. Implicit and explicit (IMEX) time-stepping schemes have long been combined in models of the atmosphere using semi-implicit, split-explicit or HEVI splitting. However, most studies of the accuracy and stability of IMEX schemes have been limited to the parabolic case of advection–diffusion equations. We demonstrate how a number of Runge–Kutta IMEX schemes can be used to solve hyperbolic wave equations either semi-implicitly or HEVI. A new form of HEVI splitting is proposed, UfPreb, which dramatically improves accuracy and stability of simulations of gravity waves in stratified flow. As a consequence it is found that there are HEVI schemes that do not lose accuracy in comparison to semi-implicit ones. The stability limits of a number of variations of trapezoidal implicit and some Runge–Kutta IMEX schemes are found and the schemes are tested on two vertical slice cases using the compressible Boussinesq equations split into various combinations of implicit and explicit terms. Some of the Runge–Kutta schemes are found to be beneficial over trapezoidal, especially since they damp high frequencies without dropping to first-order accuracy. We test schemes that are not formally accurate for stiff systems but in stiff limits (nearly incompressible) and find that they can perform well. The scheme ARK2(2,3,2) performs the best in the tests.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Hybrid multiprocessor architectures which combine re-configurable computing and multiprocessors on a chip are being proposed to transcend the performance of standard multi-core parallel systems. Both fine-grained and coarse-grained parallel algorithm implementations are feasible in such hybrid frameworks. A compositional strategy for designing fine-grained multi-phase regular processor arrays to target hybrid architectures is presented in this paper. The method is based on deriving component designs using classical regular array techniques and composing the components into a unified global design. Effective designs with phase-changes and data routing at run-time are characteristics of these designs. In order to describe the data transfer between phases, the concept of communication domain is introduced so that the producer–consumer relationship arising from multi-phase computation can be treated in a unified way as a data routing phase. This technique is applied to derive new designs of multi-phase regular arrays with different dataflow between phases of computation.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

We have optimised the atmospheric radiation algorithm of the FAMOUS climate model on several hardware platforms. The optimisation involved translating the Fortran code to C and restructuring the algorithm around the computation of a single air column. Instead of the existing MPI-based domain decomposition, we used a task queue and a thread pool to schedule the computation of individual columns on the available processors. Finally, four air columns are packed together in a single data structure and computed simultaneously using Single Instruction Multiple Data operations. The modified algorithm runs more than 50 times faster on the CELL’s Synergistic Processing Elements than on its main PowerPC processing element. On Intel-compatible processors, the new radiation code runs 4 times faster. On the tested graphics processor, using OpenCL, we find a speed-up of more than 2.5 times as compared to the original code on the main CPU. Because the radiation code takes more than 60% of the total CPU time, FAMOUS executes more than twice as fast. Our version of the algorithm returns bit-wise identical results, which demonstrates the robustness of our approach. We estimate that this project required around two and a half man-years of work.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Mathematics in Defence 2011 Abstract. We review transreal arithmetic and present transcomplex arithmetic. These arithmetics have no exceptions. This leads to incremental improvements in computer hardware and software. For example, the range of real numbers, encoded by floating-point bits, is doubled when all of the Not-a-Number(NaN) states, in IEEE 754 arithmetic, are replaced with real numbers. The task of programming such systems is simplified and made safer by discarding the unordered relational operator,leaving only the operators less-than, equal-to, and greater than. The advantages of using a transarithmetic in a computation, or transcomputation as we prefer to call it, may be had by making small changes to compilers and processor designs. However, radical change is possible by exploiting the reliability of transcomputations to make pipelined dataflow machines with a large number of cores. Our initial designs are for a machine with order one million cores. Such a machine can complete the execution of multiple in-line programs each clock tick

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Performance modelling is a useful tool in the lifeycle of high performance scientific software, such as weather and climate models, especially as a means of ensuring efficient use of available computing resources. In particular, sufficiently accurate performance prediction could reduce the effort and experimental computer time required when porting and optimising a climate model to a new machine. In this paper, traditional techniques are used to predict the computation time of a simple shallow water model which is illustrative of the computation (and communication) involved in climate models. These models are compared with real execution data gathered on AMD Opteron-based systems, including several phases of the U.K. academic community HPC resource, HECToR. Some success is had in relating source code to achieved performance for the K10 series of Opterons, but the method is found to be inadequate for the next-generation Interlagos processor. The experience leads to the investigation of a data-driven application benchmarking approach to performance modelling. Results for an early version of the approach are presented using the shallow model as an example.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A virtual system that emulates an ARM-based processor machine has been created to replace a traditional hardware-based system for teaching assembly language. The proposed virtual system integrates, in a single environment, all the development tools necessary to deliver introductory or advanced courses on modern assembly language programming. The virtual system runs a Linux operating system in either a graphical or console mode on a Windows or Linux host machine. No software licenses or extra hardware are required to use the virtual system, thus students are free to carry their own ARM emulator with them on a USB memory stick. Institutions adopting this, or a similar virtual system, can also benefit by reducing capital investment in hardware-based development kits and enable distance learning courses.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A large body of psycholinguistic research has revealed that during sentence interpretation adults coordinate multiple sources of information. Particularly, they draw both on linguistic properties of the message and on information from the context to constrain their interpretations. Relatively little however is known about how this integrative processor develops through language acquisition and about how children process language. In this study, two on-line picture verification tasks were used to examine how 1st, 2nd and 4th/5th grade monolingual Greek children resolve pronoun ambiguities during sentence interpretation and how their performance compares to that of adults on the same tasks. Specifically, we manipulated the type of subject pronoun, i.e. null or overt, and examined how this affected participants’ preferences for competing antecedents, i.e. in the subject or object position. The results revealed both similarities and differences in how adults and the various child groups comprehended ambiguous pronominal forms. Particularly, although adults and children alike showed sensitivity to the distribution of overt and null subject pronouns, this did not always lead to convergent interpretation preferences.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Wireless Sensor Networks (WSNs) have been an exciting topic in recent years. The services offered by a WSN can be classified into three major categories: monitoring, alerting, and information on demand. WSNs have been used for a variety of applications related to the environment (agriculture, water and forest fire detection), the military, buildings, health (elderly people and home monitoring), disaster relief, and area or industrial monitoring. In most WSNs tasks like processing the sensed data, making decisions and generating emergency messages are carried out by a remote server, hence the need for efficient means of transferring data across the network. Because of the range of applications and types of WSN there is a need for different kinds of MAC and routing protocols in order to guarantee delivery of data from the source nodes to the server (or sink). In order to minimize energy consumption and increase performance in areas such as reliability of data delivery, extensive research has been conducted and documented in the literature on designing energy efficient protocols for each individual layer. The most common way to conserve energy in WSNs involves using the MAC layer to put the transceiver and the processor of the sensor node into a low power, sleep state when they are not being used. Hence the energy wasted due to collisions, overhearing and idle listening is reduced. As a result of this strategy for saving energy, the routing protocols need new solutions that take into account the sleep state of some nodes, and which also enable the lifetime of the entire network to be increased by distributing energy usage between nodes over time. This could mean that a combined MAC and routing protocol could significantly improve WSNs because the interaction between the MAC and network layers lets nodes be active at the same time in order to deal with data transmission. In the research presented in this thesis, a cross-layer protocol based on MAC and routing protocols was designed in order to improve the capability of WSNs for a range of different applications. Simulation results, based on a range of realistic scenarios, show that these new protocols improve WSNs by reducing their energy consumption as well as enabling them to support mobile nodes, where necessary. A number of conference and journal papers have been published to disseminate these results for a range of applications.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

We present a new technique for obtaining model fittings to very long baseline interferometric images of astrophysical jets. The method minimizes a performance function proportional to the sum of the squared difference between the model and observed images. The model image is constructed by summing N(s) elliptical Gaussian sources characterized by six parameters: two-dimensional peak position, peak intensity, eccentricity, amplitude, and orientation angle of the major axis. We present results for the fitting of two main benchmark jets: the first constructed from three individual Gaussian sources, the second formed by five Gaussian sources. Both jets were analyzed by our cross-entropy technique in finite and infinite signal-to-noise regimes, the background noise chosen to mimic that found in interferometric radio maps. Those images were constructed to simulate most of the conditions encountered in interferometric images of active galactic nuclei. We show that the cross-entropy technique is capable of recovering the parameters of the sources with a similar accuracy to that obtained from the very traditional Astronomical Image Processing System Package task IMFIT when the image is relatively simple (e. g., few components). For more complex interferometric maps, our method displays superior performance in recovering the parameters of the jet components. Our methodology is also able to show quantitatively the number of individual components present in an image. An additional application of the cross-entropy technique to a real image of a BL Lac object is shown and discussed. Our results indicate that our cross-entropy model-fitting technique must be used in situations involving the analysis of complex emission regions having more than three sources, even though it is substantially slower than current model-fitting tasks (at least 10,000 times slower for a single processor, depending on the number of sources to be optimized). As in the case of any model fitting performed in the image plane, caution is required in analyzing images constructed from a poorly sampled (u, v) plane.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper proposes a parallel hardware architecture for image feature detection based on the Scale Invariant Feature Transform algorithm and applied to the Simultaneous Localization And Mapping problem. The work also proposes specific hardware optimizations considered fundamental to embed such a robotic control system on-a-chip. The proposed architecture is completely stand-alone; it reads the input data directly from a CMOS image sensor and provides the results via a field-programmable gate array coupled to an embedded processor. The results may either be used directly in an on-chip application or accessed through an Ethernet connection. The system is able to detect features up to 30 frames per second (320 x 240 pixels) and has accuracy similar to a PC-based implementation. The achieved system performance is at least one order of magnitude better than a PC-based solution, a result achieved by investigating the impact of several hardware-orientated optimizations oil performance, area and accuracy.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In 2006 the Route load balancing algorithm was proposed and compared to other techniques aiming at optimizing the process allocation in grid environments. This algorithm schedules tasks of parallel applications considering computer neighborhoods (where the distance is defined by the network latency). Route presents good results for large environments, although there are cases where neighbors do not have an enough computational capacity nor communication system capable of serving the application. In those situations the Route migrates tasks until they stabilize in a grid area with enough resources. This migration may take long time what reduces the overall performance. In order to improve such stabilization time, this paper proposes RouteGA (Route with Genetic Algorithm support) which considers historical information on parallel application behavior and also the computer capacities and load to optimize the scheduling. This information is extracted by using monitors and summarized in a knowledge base used to quantify the occupation of tasks. Afterwards, such information is used to parameterize a genetic algorithm responsible for optimizing the task allocation. Results confirm that RouteGA outperforms the load balancing carried out by the original Route, which had previously outperformed others scheduling algorithms from literature.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper presents the use of a multiprocessor architecture for the performance improvement of tomographic image reconstruction. Image reconstruction in computed tomography (CT) is an intensive task for single-processor systems. We investigate the filtered image reconstruction suitability based on DSPs organized for parallel processing and its comparison with the Message Passing Interface (MPI) library. The experimental results show that the speedups observed for both platforms were increased in the same direction of the image resolution. In addition, the execution time to communication time ratios (Rt/Rc) as a function of the sample size have shown a narrow variation for the DSP platform in comparison with the MPI platform, which indicates its better performance for parallel image reconstruction.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In order to achieve the high performance, we need to have an efficient scheduling of a parallelprogram onto the processors in multiprocessor systems that minimizes the entire executiontime. This problem of multiprocessor scheduling can be stated as finding a schedule for ageneral task graph to be executed on a multiprocessor system so that the schedule length can be minimize [10]. This scheduling problem is known to be NP- Hard.In multi processor task scheduling, we have a number of CPU’s on which a number of tasksare to be scheduled that the program’s execution time is minimized. According to [10], thetasks scheduling problem is a key factor for a parallel multiprocessor system to gain betterperformance. A task can be partitioned into a group of subtasks and represented as a DAG(Directed Acyclic Graph), so the problem can be stated as finding a schedule for a DAG to beexecuted in a parallel multiprocessor system so that the schedule can be minimized. Thishelps to reduce processing time and increase processor utilization. The aim of this thesis workis to check and compare the results obtained by Bee Colony algorithm with already generatedbest known results in multi processor task scheduling domain.