843 resultados para parallel processing systems
Resumo:
The reverse engineering of a skeleton based programming environment and redesign to distribute management activities of the system and thereby remove a potential single point of failure is considered. The Ore notation is used to facilitate abstraction of the design and analysis of its properties. It is argued that Ore is particularly suited to this role as this type of management is essentially an orchestration activity. The Ore specification of the original version of the system is modified via a series of semi-formally justified derivation steps to obtain a specification of the decentralized management version which is then used as a basis for its implementation. Analysis of the two specifications allows qualitative prediction of the expected performance of the derived version with respect to the original, and this prediction is borne out in practice.
Resumo:
The main motivation for the work presented here began with previously conducted experiments with a programming concept at the time named "Macro". These experiments led to the conviction that it would be possible to build a system of engine control from scratch, which could eliminate many of the current problems of engine management systems in a direct and intrinsic way. It was also hoped that it would minimize the full range of software and hardware needed to make a final and fully functional system. Initially, this paper proposes to make a comprehensive survey of the state of the art in the specific area of software and corresponding hardware of automotive tools and automotive ECUs. Problems arising from such software will be identified, and it will be clear that practically all of these problems stem directly or indirectly from the fact that we continue to make comprehensive use of extremely long and complex "tool chains". Similarly, in the hardware, it will be argued that the problems stem from the extreme complexity and inter-dependency inside processor architectures. The conclusions are presented through an extensive list of "pitfalls" which will be thoroughly enumerated, identified and characterized. Solutions will also be proposed for the various current issues and for the implementation of these same solutions. All this final work will be part of a "proof-of-concept" system called "ECU2010". The central element of this system is the before mentioned "Macro" concept, which is an graphical block representing one of many operations required in a automotive system having arithmetic, logic, filtering, integration, multiplexing functions among others. The end result of the proposed work is a single tool, fully integrated, enabling the development and management of the entire system in one simple visual interface. Part of the presented result relies on a hardware platform fully adapted to the software, as well as enabling high flexibility and scalability in addition to using exactly the same technology for ECU, data logger and peripherals alike. Current systems rely on a mostly evolutionary path, only allowing online calibration of parameters, but never the online alteration of their own automotive functionality algorithms. By contrast, the system developed and described in this thesis had the advantage of following a "clean-slate" approach, whereby everything could be rethought globally. In the end, out of all the system characteristics, "LIVE-Prototyping" is the most relevant feature, allowing the adjustment of automotive algorithms (eg. Injection, ignition, lambda control, etc.) 100% online, keeping the engine constantly working, without ever having to stop or reboot to make such changes. This consequently eliminates any "turnaround delay" typically present in current automotive systems, thereby enhancing the efficiency and handling of such systems.
Resumo:
The Adaptive Generalized Predictive Control (AGPC) algorithm can be speeded up using parallel processing. Since the AGPC algorithm needs to be fed with the knowledge of the plant transfer function, the parallelization of a standard Recursive Least Squares (RLS) estimator and a GPC predictor is discussed here.
Resumo:
Least squares solutions are a very important problem, which appear in a broad range of disciplines (for instance, control systems, statistics, signal processing). Our interest in this kind of problems lies in their use of training neural network controllers.
Resumo:
Evaluation of blood-flow Doppler ultrasound spectral content is currently performed on clinical diagnosis. Since mean frequency and bandwidth spectral parameters are determinants on the quantification of stenotic degree, more precise estimators than the conventional Fourier transform should be seek. This paper summarizes studies led by the author in this field, as well as the strategies used to implement the methods in real-time. Regarding stationary and nonstationary characteristics of the blood-flow signal, different models were assessed. When autoregressive and autoregressive moving average models were compared with the traditional Fourier based methods in terms of their statistical performance while estimating both spectral parameters, the Modified Covariance model was identified by the cost/benefit criterion as the estimator presenting better performance. The performance of three time-frequency distributions and the Short Time Fourier Transform was also compared. The Choi-Williams distribution proved to be more accurate than the other methods. The identified spectral estimators were developed and optimized using high performance techniques. Homogeneous and heterogeneous architectures supporting multiple instruction multiple data parallel processing were essayed. Results obtained proved that real-time implementation of the blood-flow estimators is feasible, enhancing the usage of more complex spectral models on other ultrasonic systems.
Resumo:
Single processor architectures are unable to provide the required performance of high performance embedded systems. Parallel processing based on general-purpose processors can achieve these performances with a considerable increase of required resources. However, in many cases, simplified optimized parallel cores can be used instead of general-purpose processors achieving better performance at lower resource utilization. In this paper, we propose a configurable many-core architecture to serve as a co-processor for high-performance embedded computing on Field-Programmable Gate Arrays. The architecture consists of an array of configurable simple cores with support for floating-point operations interconnected with a configurable interconnection network. For each core it is possible to configure the size of the internal memory, the supported operations and number of interfacing ports. The architecture was tested in a ZYNQ-7020 FPGA in the execution of several parallel algorithms. The results show that the proposed many-core architecture achieves better performance than that achieved with a parallel generalpurpose processor and that up to 32 floating-point cores can be implemented in a ZYNQ-7020 SoC FPGA.
Resumo:
The process of developing software that takes advantage of multiple processors is commonly referred to as parallel programming. For various reasons, this process is much harder than the sequential case. For decades, parallel programming has been a problem for a small niche only: engineers working on parallelizing mostly numerical applications in High Performance Computing. This has changed with the advent of multi-core processors in mainstream computer architectures. Parallel programming in our days becomes a problem for a much larger group of developers. The main objective of this thesis was to find ways to make parallel programming easier for them. Different aims were identified in order to reach the objective: research the state of the art of parallel programming today, improve the education of software developers about the topic, and provide programmers with powerful abstractions to make their work easier. To reach these aims, several key steps were taken. To start with, a survey was conducted among parallel programmers to find out about the state of the art. More than 250 people participated, yielding results about the parallel programming systems and languages in use, as well as about common problems with these systems. Furthermore, a study was conducted in university classes on parallel programming. It resulted in a list of frequently made mistakes that were analyzed and used to create a programmers' checklist to avoid them in the future. For programmers' education, an online resource was setup to collect experiences and knowledge in the field of parallel programming - called the Parawiki. Another key step in this direction was the creation of the Thinking Parallel weblog, where more than 50.000 readers to date have read essays on the topic. For the third aim (powerful abstractions), it was decided to concentrate on one parallel programming system: OpenMP. Its ease of use and high level of abstraction were the most important reasons for this decision. Two different research directions were pursued. The first one resulted in a parallel library called AthenaMP. It contains so-called generic components, derived from design patterns for parallel programming. These include functionality to enhance the locks provided by OpenMP, to perform operations on large amounts of data (data-parallel programming), and to enable the implementation of irregular algorithms using task pools. AthenaMP itself serves a triple role: the components are well-documented and can be used directly in programs, it enables developers to study the source code and learn from it, and it is possible for compiler writers to use it as a testing ground for their OpenMP compilers. The second research direction was targeted at changing the OpenMP specification to make the system more powerful. The main contributions here were a proposal to enable thread-cancellation and a proposal to avoid busy waiting. Both were implemented in a research compiler, shown to be useful in example applications, and proposed to the OpenMP Language Committee.
Resumo:
It has been previously demonstrated that extensive activation in the dorsolateral temporal lobes associated with masking a speech target with a speech masker, consistent with the hypothesis that competition for central auditory processes is an important factor in informational masking. Here, masking from speech and two additional maskers derived from the original speech were investigated. One of these is spectrally rotated speech, which is unintelligible and has a similar (inverted) spectrotemporal profile to speech. The authors also controlled for the possibility of “glimpsing” of the target signal during modulated masking sounds by using speech-modulated noise as a masker in a baseline condition. Functional imaging results reveal that masking speech with speech leads to bilateral superior temporal gyrus (STG) activation relative to a speech-in-noise baseline, while masking speech with spectrally rotated speech leads solely to right STG activation relative to the baseline. This result is discussed in terms of hemispheric asymmetries for speech perception, and interpreted as showing that masking effects can arise through two parallel neural systems, in the left and right temporal lobes. This has implications for the competition for resources caused by speech and rotated speech maskers, and may illuminate some of the mechanisms involved in informational masking.
Resumo:
One among the most influential and popular data mining methods is the k-Means algorithm for cluster analysis. Techniques for improving the efficiency of k-Means have been largely explored in two main directions. The amount of computation can be significantly reduced by adopting geometrical constraints and an efficient data structure, notably a multidimensional binary search tree (KD-Tree). These techniques allow to reduce the number of distance computations the algorithm performs at each iteration. A second direction is parallel processing, where data and computation loads are distributed over many processing nodes. However, little work has been done to provide a parallel formulation of the efficient sequential techniques based on KD-Trees. Such approaches are expected to have an irregular distribution of computation load and can suffer from load imbalance. This issue has so far limited the adoption of these efficient k-Means variants in parallel computing environments. In this work, we provide a parallel formulation of the KD-Tree based k-Means algorithm for distributed memory systems and address its load balancing issue. Three solutions have been developed and tested. Two approaches are based on a static partitioning of the data set and a third solution incorporates a dynamic load balancing policy.
Resumo:
It has been previously demonstrated that extensive activation in the dorsolateral temporal lobes associated with masking a speech target with a speech masker, consistent with the hypothesis that competition for central auditory processes is an important factor in informational masking. Here, masking from speech and two additional maskers derived from the original speech were investigated. One of these is spectrally rotated speech, which is unintelligible and has a similar (inverted) spectrotemporal profile to speech. The authors also controlled for the possibility of "glimpsing" of the target signal during modulated masking sounds by using speech-modulated noise as a masker in a baseline condition. Functional imaging results reveal that masking speech with speech leads to bilateral superior temporal gyrus (STG) activation relative to a speech-in-noise baseline, while masking speech with spectrally rotated speech leads solely to right STG activation relative to the baseline. This result is discussed in terms of hemispheric asymmetries for speech perception, and interpreted as showing that masking effects can arise through two parallel neural systems, in the left and right temporal lobes. This has implications for the competition for resources caused by speech and rotated speech maskers, and may illuminate some of the mechanisms involved in informational masking.
Resumo:
This paper is concerned with the uniformization of a system of afine recurrence equations. This transformation is used in the design (or compilation) of highly parallel embedded systems (VLSI systolic arrays, signal processing filters, etc.). In this paper, we present and implement an automatic system to achieve uniformization of systems of afine recurrence equations. We unify the results from many earlier papers, develop some theoretical extensions, and then propose effective uniformization algorithms. Our results can be used in any high level synthesis tool based on polyhedral representation of nested loop computations.
Resumo:
A connection between a fuzzy neural network model with the mixture of experts network (MEN) modelling approach is established. Based on this linkage, two new neuro-fuzzy MEN construction algorithms are proposed to overcome the curse of dimensionality that is inherent in the majority of associative memory networks and/or other rule based systems. The first construction algorithm employs a function selection manager module in an MEN system. The second construction algorithm is based on a new parallel learning algorithm in which each model rule is trained independently, for which the parameter convergence property of the new learning method is established. As with the first approach, an expert selection criterion is utilised in this algorithm. These two construction methods are equivalent in their effectiveness in overcoming the curse of dimensionality by reducing the dimensionality of the regression vector, but the latter has the additional computational advantage of parallel processing. The proposed algorithms are analysed for effectiveness followed by numerical examples to illustrate their efficacy for some difficult data based modelling problems.
Resumo:
Global communicationrequirements andloadimbalanceof someparalleldataminingalgorithms arethe major obstacles to exploitthe computational power of large-scale systems. This work investigates how non-uniform data distributions can be exploited to remove the global communication requirement and to reduce the communication costin parallel data mining algorithms and, in particular, in the k-means algorithm for cluster analysis. In the straightforward parallel formulation of the k-means algorithm, data and computation loads are uniformly distributed over the processing nodes. This approach has excellent load balancing characteristics that may suggest it could scale up to large and extreme-scale parallel computing systems. However, at each iteration step the algorithm requires a global reduction operationwhichhinders thescalabilityoftheapproach.Thisworkstudiesadifferentparallelformulation of the algorithm where the requirement of global communication is removed, while maintaining the same deterministic nature ofthe centralised algorithm. The proposed approach exploits a non-uniform data distribution which can be either found in real-world distributed applications or can be induced by means ofmulti-dimensional binary searchtrees. The approachcanalso be extended to accommodate an approximation error which allows a further reduction ofthe communication costs. The effectiveness of the exact and approximate methods has been tested in a parallel computing system with 64 processors and in simulations with 1024 processing element
Resumo:
The web services (WS) technology provides a comprehensive solution for representing, discovering, and invoking services in a wide variety of environments, including Service Oriented Architectures (SOA) and grid computing systems. At the core of WS technology lie a number of XML-based standards, such as the Simple Object Access Protocol (SOAP), that have successfully ensured WS extensibility, transparency, and interoperability. Nonetheless, there is an increasing demand to enhance WS performance, which is severely impaired by XML's verbosity. SOAP communications produce considerable network traffic, making them unfit for distributed, loosely coupled, and heterogeneous computing environments such as the open Internet. Also, they introduce higher latency and processing delays than other technologies, like Java RMI and CORBA. WS research has recently focused on SOAP performance enhancement. Many approaches build on the observation that SOAP message exchange usually involves highly similar messages (those created by the same implementation usually have the same structure, and those sent from a server to multiple clients tend to show similarities in structure and content). Similarity evaluation and differential encoding have thus emerged as SOAP performance enhancement techniques. The main idea is to identify the common parts of SOAP messages, to be processed only once, avoiding a large amount of overhead. Other approaches investigate nontraditional processor architectures, including micro-and macrolevel parallel processing solutions, so as to further increase the processing rates of SOAP/XML software toolkits. This survey paper provides a concise, yet comprehensive review of the research efforts aimed at SOAP performance enhancement. A unified view of the problem is provided, covering almost every phase of SOAP processing, ranging over message parsing, serialization, deserialization, compression, multicasting, security evaluation, and data/instruction-level processing.
Resumo:
Classical Pavlovian fear conditioning to painful stimuli has provided the generally accepted view of a core system centered in the central amygdala to organize fear responses. Ethologically based models using other sources of threat likely to be expected in a natural environment, such as predators or aggressive dominant conspecifics, have challenged this concept of a unitary core circuit for fear processing. We discuss here what the ethologically based models have told us about the neural systems organizing fear responses. We explored the concept that parallel paths process different classes of threats, and that these different paths influence distinct regions in the periaqueductal gray - a critical element for the organization of all kinds of fear responses. Despite this parallel processing of different kinds of threats, we have discussed an interesting emerging view that common cortical-hippocampal-amygdalar paths seem to be engaged in fear conditioning to painful stimuli, to predators and, perhaps, to aggressive dominant conspecifics as well. Overall, the aim of this review is to bring into focus a more global and comprehensive view of the systems organizing fear responses.