882 resultados para Parallel processing systems
Resumo:
The number of applications based on embedded systems grows significantly every year, even with the fact that embedded systems have restrictions, and simple processing units, the performance of these has improved every day. However the complexity of applications also increase, a better performance will always be necessary. So even such advances, there are cases, which an embedded system with a single unit of processing is not sufficient to achieve the information processing in real time. To improve the performance of these systems, an implementation with parallel processing can be used in more complex applications that require high performance. The idea is to move beyond applications that already use embedded systems, exploring the use of a set of units processing working together to implement an intelligent algorithm. The number of existing works in the areas of parallel processing, systems intelligent and embedded systems is wide. However works that link these three areas to solve any problem are reduced. In this context, this work aimed to use tools available for FPGA architectures, to develop a platform with multiple processors to use in pattern classification with artificial neural networks
Resumo:
In some applications with case-based system, the attributes available for indexing are better described as linguistic variables instead of receiving numerical treatment. In these applications, the concept of fuzzy hypercube can be applied to give a geometrical interpretation of similarities among cases. This paper presents an approach that uses geometrical properties of fuzzy hypercube space to make indexing and retrieval processes of cases.
Resumo:
A novel hybrid high power rectifier capable to achieve unity power factor is proposed in this paper. Single-phase SEPIC rectifiers are associated in parallel with each leg of three-phase 6-pulse diode rectifier resulting in a programmable input current waveform structure. In this paper it is described the principles of operation of the proposed converter with detailed simulation and experimental results. For a total harmonic distortion of the input line current (THDI) less than 2% the rated power of the SEPIC rectifiers is 33%. Therefore, power rating of the SEPIC parallel converters is a fraction of the output power, on the range of 20% to 33% of the nominal output power, making the proposed solution economically viable for high power installations, with fast pay back of the investment. Moreover, retrofits to existing installations are also possible with this proposed topology, since the parallel path can be easily controlled by integration with the already existing de-link. Experimental results are presented for a 3 kW implemented prototype, in order to verify the developed analysis.
Resumo:
An improvement to the quality bidimensional Delaunay mesh generation algorithm, which combines the mesh refinement algorithms strategy of Ruppert and Shewchuk is proposed in this research. The developed technique uses diametral lenses criterion, introduced by L. P. Chew, with the purpose of eliminating the extremely obtuse triangles in the boundary mesh. This method splits the boundary segment and obtains an initial prerefinement, and thus reducing the number of necessary iterations to generate a high quality sequential triangulation. Moreover, it decreases the intensity of the communication and synchronization between subdomains in parallel mesh refinement. © 2008 IEEE.
Resumo:
Software Transactional Memory (STM) systems have poor performance under high contention scenarios. Since many transactions compete for the same data, most of them are aborted, wasting processor runtime. Contention management policies are typically used to avoid that, but they are passive approaches as they wait for an abort to happen so they can take action. More proactive approaches have emerged, trying to predict when a transaction is likely to abort so its execution can be delayed. Such techniques are limited, as they do not replace the doomed transaction by another or, when they do, they rely on the operating system for that, having little or no control on which transaction should run. In this paper we propose LUTS, a Lightweight User-Level Transaction Scheduler, which is based on an execution context record mechanism. Unlike other techniques, LUTS provides the means for selecting another transaction to run in parallel, thus improving system throughput. Moreover, it avoids most of the issues caused by pseudo parallelism, as it only launches as many system-level threads as the number of available processor cores. We discuss LUTS design and present three conflict-avoidance heuristics built around LUTS scheduling capabilities. Experimental results, conducted with STMBench7 and STAMP benchmark suites, show LUTS efficiency when running high contention applications and how conflict-avoidance heuristics can improve STM performance even more. In fact, our transaction scheduling techniques are capable of improving program performance even in overloaded scenarios. © 2011 Springer-Verlag.
Resumo:
Transactional memory (TM) is a new synchronization mechanism devised to simplify parallel programming, thereby helping programmers to unleash the power of current multicore processors. Although software implementations of TM (STM) have been extensively analyzed in terms of runtime performance, little attention has been paid to an equally important constraint faced by nearly all computer systems: energy consumption. In this work we conduct a comprehensive study of energy and runtime tradeoff sin software transactional memory systems. We characterize the behavior of three state-of-the-art lock-based STM algorithms, along with three different conflict resolution schemes. As a result of this characterization, we propose a DVFS-based technique that can be integrated into the resolution policies so as to improve the energy-delay product (EDP). Experimental results show that our DVFS-enhanced policies are indeed beneficial for applications with high contention levels. Improvements of up to 59% in EDP can be observed in this scenario, with an average EDP reduction of 16% across the STAMP workloads. © 2012 IEEE.
Resumo:
The means through which the nervous system perceives its environment is one of the most fascinating questions in contemporary science. Our endeavors to comprehend the principles of neural science provide an instance of how biological processes may inspire novel methods in mathematical modeling and engineering. The application ofmathematical models towards understanding neural signals and systems represents a vibrant field of research that has spanned over half a century. During this period, multiple approaches to neuronal modeling have been adopted, and each approach is adept at elucidating a specific aspect of nervous system function. Thus while bio-physical models have strived to comprehend the dynamics of actual physical processes occurring within a nerve cell, the phenomenological approach has conceived models that relate the ionic properties of nerve cells to transitions in neural activity. Further-more, the field of neural networks has endeavored to explore how distributed parallel processing systems may become capable of storing memory. Through this project, we strive to explore how some of the insights gained from biophysical neuronal modeling may be incorporated within the field of neural net-works. We specifically study the capabilities of a simple neural model, the Resonate-and-Fire (RAF) neuron, whose derivation is inspired by biophysical neural modeling. While reflecting further biological plausibility, the RAF neuron is also analytically tractable, and thus may be implemented within neural networks. In the following thesis, we provide a brief overview of the different approaches that have been adopted towards comprehending the properties of nerve cells, along with the framework under which our specific neuron model relates to the field of neuronal modeling. Subsequently, we explore some of the time-dependent neurocomputational capabilities of the RAF neuron, and we utilize the model to classify logic gates, and solve the classic XOR problem. Finally we explore how the resonate-and-fire neuron may be implemented within neural networks, and how such a network could be adapted through the temporal backpropagation algorithm.
Resumo:
Parallel processing systems require complex interconnection networks. In order to obtain fast and flexible communications at a reasonable cost, different types of networks has been studied in the past. None of them can be considered best. The cost-effectiveness of a particular network design depends of several factors that will not be treat here. Nevertheless, the basic device that configurate an interconnection network can be the same for most of them. In this way, an Optical Interconetion Network made with Holographic Optical Element (HOE) is presented. The HOE recording way use present special caracteristics that are described. A Perfect Shuffle and Banyan networks has been implemented.
Resumo:
Embedded real-time applications increasingly present high computation requirements, which need to be completed within specific deadlines, but that present highly variable patterns, depending on the set of data available in a determined instant. The current trend to provide parallel processing in the embedded domain allows providing higher processing power; however, it does not address the variability in the processing pattern. Dimensioning each device for its worst-case scenario implies lower average utilization, and increased available, but unusable, processing in the overall system. A solution for this problem is to extend the parallel execution of the applications, allowing networked nodes to distribute the workload, on peak situations, to neighbour nodes. In this context, this report proposes a framework to develop parallel and distributed real-time embedded applications, transparently using OpenMP and Message Passing Interface (MPI), within a programming model based on OpenMP. The technical report also devises an integrated timing model, which enables the structured reasoning on the timing behaviour of these hybrid architectures.
Resumo:
Dissertação para obtenção do Grau de Mestre em Engenharia Biomédica
Resumo:
The subdivisions of human inferior colliculus are currently based on Golgi and Nissl-stained preparations. We have investigated the distribution of calcium-binding protein immunoreactivity in the human inferior colliculus and found complementary or mutually exclusive localisations of parvalbumin versus calbindin D-28k and calretinin staining. The central nucleus of the inferior colliculus but not the surrounding regions contained parvalbumin-positive neuronal somata and fibres. Calbindin-positive neurons and fibres were concentrated in the dorsal aspect of the central nucleus and in structures surrounding it: the dorsal cortex, the lateral lemniscus, the ventrolateral nucleus, and the intercollicular region. In the dorsal cortex, labelling of calbindin and calretinin revealed four distinct layers.Thus, calcium-binding protein reactivity reveals in the human inferior colliculus distinct neuronal populations that are anatomically segregated. The different calcium-binding protein-defined subdivisions may belong to parallel auditory pathways that were previously demonstrated in non-human primates, and they may constitute a first indication of parallel processing in human subcortical auditory structures.
Resumo:
The performance of the parallel vector implementation of the one- and two-dimensional orthogonal transforms is evaluated. The orthogonal transforms are computed using actual or modified fast Fourier transform (FFT) kernels. The factors considered in comparing the speed-up of these vectorized digital signal processing algorithms are discussed and it is shown that the traditional way of comparing th execution speed of digital signal processing algorithms by the ratios of the number of multiplications and additions is no longer effective for vector implementation; the structure of the algorithm must also be considered as a factor when comparing the execution speed of vectorized digital signal processing algorithms. Simulation results on the Cray X/MP with the following orthogonal transforms are presented: discrete Fourier transform (DFT), discrete cosine transform (DCT), discrete sine transform (DST), discrete Hartley transform (DHT), discrete Walsh transform (DWHT), and discrete Hadamard transform (DHDT). A comparison between the DHT and the fast Hartley transform is also included.(34 refs)
Resumo:
"UILU-ENG 78 1745."
Resumo:
Includes bibliographical references.