922 resultados para Parallel numerical algorithms
Resumo:
In this paper a parallel implementation of an Adaprtive Generalized Predictive Control (AGPC) algorithm is presented. Since the AGPC algorithm needs to be fed with knowledge of the plant transfer function, the parallelization of a standard Recursive Least Squares (RLS) estimator and a GPC predictor is discussed here.
Resumo:
Evaluation of blood-flow Doppler ultrasound spectral content is currently performed on clinical diagnosis. Since mean frequency and bandwidth spectral parameters are determinants on the quantification of stenotic degree, more precise estimators than the conventional Fourier transform should be seek. This paper summarizes studies led by the author in this field, as well as the strategies used to implement the methods in real-time. Regarding stationary and nonstationary characteristics of the blood-flow signal, different models were assessed. When autoregressive and autoregressive moving average models were compared with the traditional Fourier based methods in terms of their statistical performance while estimating both spectral parameters, the Modified Covariance model was identified by the cost/benefit criterion as the estimator presenting better performance. The performance of three time-frequency distributions and the Short Time Fourier Transform was also compared. The Choi-Williams distribution proved to be more accurate than the other methods. The identified spectral estimators were developed and optimized using high performance techniques. Homogeneous and heterogeneous architectures supporting multiple instruction multiple data parallel processing were essayed. Results obtained proved that real-time implementation of the blood-flow estimators is feasible, enhancing the usage of more complex spectral models on other ultrasonic systems.
Resumo:
The UMTS turbo encoder is composed of parallel concatenation of two Recursive Systematic Convolutional (RSC) encoders which start and end at a known state. This trellis termination directly affects the performance of turbo codes. This paper presents performance analysis of multi-point trellis termination of turbo codes which is to terminate RSC encoders at more than one point of the current frame while keeping the interleaver length the same. For long interleaver lengths, this approach provides dividing a data frame into sub-frames which can be treated as independent blocks. A novel decoding architecture using multi-point trellis termination and collision-free interleavers is presented. Collision-free interleavers are used to solve memory collision problems encountered by parallel decoding of turbo codes. The proposed parallel decoding architecture reduces the decoding delay caused by the iterative nature and forward-backward metric computations of turbo decoding algorithms. Our simulations verified that this turbo encoding and decoding scheme shows Bit Error Rate (BER) performance very close to that of the UMTS turbo coding while providing almost %50 time saving for the 2-point termination and %80 time saving for the 5-point termination.
Resumo:
Turbo codes experience a significant decoding delay because of the iterative nature of the decoding algorithms, the high number of metric computations and the complexity added by the (de)interleaver. The extrinsic information is exchanged sequentially between two Soft-Input Soft-Output (SISO) decoders. Instead of this sequential process, a received frame can be divided into smaller windows to be processed in parallel. In this paper, a novel parallel processing methodology is proposed based on the previous parallel decoding techniques. A novel Contention-Free (CF) interleaver is proposed as part of the decoding architecture which allows using extrinsic Log-Likelihood Ratios (LLRs) immediately as a-priori LLRs to start the second half of the iterative turbo decoding. The simulation case studies performed in this paper show that our parallel decoding method can provide %80 time saving compared to the standard decoding and %30 time saving compared to the previous parallel decoding methods at the expense of 0.3 dB Bit Error Rate (BER) performance degradation.
Resumo:
In this paper, it is studied the dynamics of the robotic bird in terms of time response and robustness. It is analyzed the wing angle of attack and the velocity of the bird, the tail influence, the gliding flight and the flapping flight. The results are positive for the construction of flying robots. The development of computational simulation based on the dynamic of the robotic bird should allow testing strategies and different algorithms of control such as integer and fractional controllers.
Resumo:
This study addresses the optimization of rational fraction approximations for the discrete-time calculation of fractional derivatives. The article starts by analyzing the standard techniques based on Taylor series and Padé expansions. In a second phase the paper re-evaluates the problem in an optimization perspective by tacking advantage of the flexibility of the genetic algorithms.
Resumo:
This paper addresses the calculation of derivatives of fractional order for non-smooth data. The noise is avoided by adopting an optimization formulation using genetic algorithms (GA). Given the flexibility of the evolutionary schemes, a hierarchical GA composed by a series of two GAs, each one with a distinct fitness function, is established.
Resumo:
The trajectory planning of redundant robots is an important area of research and efficient optimization algorithms are needed. This paper presents a new technique that combines the closed-loop pseudoinverse method with genetic algorithms. The results are compared with a genetic algorithm that adopts the direct kinematics. In both cases the trajectory planning is formulated as an optimization problem with constraints.
Resumo:
Single processor architectures are unable to provide the required performance of high performance embedded systems. Parallel processing based on general-purpose processors can achieve these performances with a considerable increase of required resources. However, in many cases, simplified optimized parallel cores can be used instead of general-purpose processors achieving better performance at lower resource utilization. In this paper, we propose a configurable many-core architecture to serve as a co-processor for high-performance embedded computing on Field-Programmable Gate Arrays. The architecture consists of an array of configurable simple cores with support for floating-point operations interconnected with a configurable interconnection network. For each core it is possible to configure the size of the internal memory, the supported operations and number of interfacing ports. The architecture was tested in a ZYNQ-7020 FPGA in the execution of several parallel algorithms. The results show that the proposed many-core architecture achieves better performance than that achieved with a parallel generalpurpose processor and that up to 32 floating-point cores can be implemented in a ZYNQ-7020 SoC FPGA.
Resumo:
The recent technological advancements and market trends are causing an interesting phenomenon towards the convergence of High-Performance Computing (HPC) and Embedded Computing (EC) domains. On one side, new kinds of HPC applications are being required by markets needing huge amounts of information to be processed within a bounded amount of time. On the other side, EC systems are increasingly concerned with providing higher performance in real-time, challenging the performance capabilities of current architectures. The advent of next-generation many-core embedded platforms has the chance of intercepting this converging need for predictable high-performance, allowing HPC and EC applications to be executed on efficient and powerful heterogeneous architectures integrating general-purpose processors with many-core computing fabrics. To this end, it is of paramount importance to develop new techniques for exploiting the massively parallel computation capabilities of such platforms in a predictable way. P-SOCRATES will tackle this important challenge by merging leading research groups from the HPC and EC communities. The time-criticality and parallelisation challenges common to both areas will be addressed by proposing an integrated framework for executing workload-intensive applications with real-time requirements on top of next-generation commercial-off-the-shelf (COTS) platforms based on many-core accelerated architectures. The project will investigate new HPC techniques that fulfil real-time requirements. The main sources of indeterminism will be identified, proposing efficient mapping and scheduling algorithms, along with the associated timing and schedulability analysis, to guarantee the real-time and performance requirements of the applications.
Resumo:
In this paper we investigate various algorithms for performing Fast Fourier Transformation (FFT)/Inverse Fast Fourier Transformation (IFFT), and proper techniques for maximizing the FFT/IFFT execution speed, such as pipelining or parallel processing, and use of memory structures with pre-computed values (look up tables -LUT) or other dedicated hardware components (usually multipliers). Furthermore, we discuss the optimal hardware architectures that best apply to various FFT/IFFT algorithms, along with their abilities to exploit parallel processing with minimal data dependences of the FFT/IFFT calculations. An interesting approach that is also considered in this paper is the application of the integrated processing-in-memory Intelligent RAM (IRAM) chip to high speed FFT/IFFT computing. The results of the assessment study emphasize that the execution speed of the FFT/IFFT algorithms is tightly connected to the capabilities of the FFT/IFFT hardware to support the provided parallelism of the given algorithm. Therefore, we suggest that the basic Discrete Fourier Transform (DFT)/Inverse Discrete Fourier Transform (IDFT) can also provide high performances, by utilizing a specialized FFT/IFFT hardware architecture that can exploit the provided parallelism of the DFT/IDF operations. The proposed improvements include simplified multiplications over symbols given in polar coordinate system, using sinе and cosine look up tables, and an approach for performing parallel addition of N input symbols.
Resumo:
In this paper we investigate various algorithms for performing Fast Fourier Transformation (FFT)/Inverse Fast Fourier Transformation (IFFT), and proper techniquesfor maximizing the FFT/IFFT execution speed, such as pipelining or parallel processing, and use of memory structures with pre-computed values (look up tables -LUT) or other dedicated hardware components (usually multipliers). Furthermore, we discuss the optimal hardware architectures that best apply to various FFT/IFFT algorithms, along with their abilities to exploit parallel processing with minimal data dependences of the FFT/IFFT calculations. An interesting approach that is also considered in this paper is the application of the integrated processing-in-memory Intelligent RAM (IRAM) chip to high speed FFT/IFFT computing. The results of the assessment study emphasize that the execution speed of the FFT/IFFT algorithms is tightly connected to the capabilities of the FFT/IFFT hardware to support the provided parallelism of the given algorithm. Therefore, we suggest that the basic Discrete Fourier Transform (DFT)/Inverse Discrete Fourier Transform (IDFT) can also provide high performances, by utilizing a specialized FFT/IFFT hardware architecture that can exploit the provided parallelism of the DFT/IDF operations. The proposed improvements include simplified multiplications over symbols given in polar coordinate system, using sinе and cosine look up tables,and an approach for performing parallel addition of N input symbols.
Resumo:
We have used massively parallel signature sequencing (MPSS) to sample the transcriptomes of 32 normal human tissues to an unprecedented depth, thus documenting the patterns of expression of almost 20,000 genes with high sensitivity and specificity. The data confirm the widely held belief that differences in gene expression between cell and tissue types are largely determined by transcripts derived from a limited number of tissue-specific genes, rather than by combinations of more promiscuously expressed genes. Expression of a little more than half of all known human genes seems to account for both the common requirements and the specific functions of the tissues sampled. A classification of tissues based on patterns of gene expression largely reproduces classifications based on anatomical and biochemical properties. The unbiased sampling of the human transcriptome achieved by MPSS supports the idea that most human genes have been mapped, if not functionally characterized. This data set should prove useful for the identification of tissue-specific genes, for the study of global changes induced by pathological conditions, and for the definition of a minimal set of genes necessary for basic cell maintenance. The data are available on the Web at http://mpss.licr.org and http://sgb.lynxgen.com.
Resumo:
The (n, k)-star interconnection network was proposed in 1995 as an attractive alternative to the n-star topology in parallel computation. The (n, k )-star has significant advantages over the n-star which itself was proposed as an attractive alternative to the popular hypercube. The major advantage of the (n, k )-star network is its scalability, which makes it more flexible than the n-star as an interconnection network. In this thesis, we will focus on finding graph theoretical properties of the (n, k )-star as well as developing parallel algorithms that run on this network. The basic topological properties of the (n, k )-star are first studied. These are useful since they can be used to develop efficient algorithms on this network. We then study the (n, k )-star network from algorithmic point of view. Specifically, we will investigate both fundamental and application algorithms for basic communication, prefix computation, and sorting, etc. A literature review of the state-of-the-art in relation to the (n, k )-star network as well as some open problems in this area are also provided.
Resumo:
The (n, k)-arrangement interconnection topology was first introduced in 1992. The (n, k )-arrangement graph is a class of generalized star graphs. Compared with the well known n-star, the (n, k )-arrangement graph is more flexible in degree and diameter. However, there are few algorithms designed for the (n, k)-arrangement graph up to present. In this thesis, we will focus on finding graph theoretical properties of the (n, k)- arrangement graph and developing parallel algorithms that run on this network. The topological properties of the arrangement graph are first studied. They include the cyclic properties. We then study the problems of communication: broadcasting and routing. Embedding problems are also studied later on. These are very useful to develop efficient algorithms on this network. We then study the (n, k )-arrangement network from the algorithmic point of view. Specifically, we will investigate both fundamental and application algorithms such as prefix sums computation, sorting, merging and basic geometry computation: finding convex hull on the (n, k )-arrangement graph. A literature review of the state-of-the-art in relation to the (n, k)-arrangement network is also provided, as well as some open problems in this area.