818 resultados para LDPC, CUDA, GPGPU, computing, GPU, DVB, S2, SDR
Resumo:
An important issue in the design of a distributed computing system (DCS) is the development of a suitable protocol. This paper presents an effort to systematize the protocol design procedure for a DCS. Protocol design and development can be divided into six phases: specification of the DCS, specification of protocol requirements, protocol design, specification and validation of the designed protocol, performance evaluation, and hardware/software implementation. This paper describes techniques for the second and third phases, while the first phase has been considered by the authors in their earlier work. Matrix and set theoretic based approaches are used for specification of a DCS and for specification of the protocol requirements. These two formal specification techniques form the basis of the development of a simple and straightforward procedure for the design of the protocol. The applicability of the above design procedure has been illustrated by considering an example of a computing system encountered on board a spacecraft. A Petri-net based approach has been adopted to model the protocol. The methodology developed in this paper can be used in other DCS applications.
Resumo:
A fuzzy system is developed using a linearized performance model of the gas turbine engine for performing gas turbine fault isolation from noisy measurements. By using a priori information about measurement uncertainties and through design variable linking, the design of the fuzzy system is posed as an optimization problem with low number of design variables which can be solved using the genetic algorithm in considerably low amount of computer time. The faults modeled are module faults in five modules: fan, low pressure compressor, high pressure compressor, high pressure turbine and low pressure turbine. The measurements used are deviations in exhaust gas temperature, low rotor speed, high rotor speed and fuel flow from a base line 'good engine'. The genetic fuzzy system (GFS) allows rapid development of the rule base if the fault signatures and measurement uncertainties change which happens for different engines and airlines. In addition, the genetic fuzzy system reduces the human effort needed in the trial and error process used to design the fuzzy system and makes the development of such a system easier and faster. A radial basis function neural network (RBFNN) is also used to preprocess the measurements before fault isolation. The RBFNN shows significant noise reduction and when combined with the GFS leads to a diagnostic system that is highly robust to the presence of noise in data. Showing the advantage of using a soft computing approach for gas turbine diagnostics.
Resumo:
A symmetric solution X satisfying the matrix equation XA = AtX is called a symmetrizer of the matrix A. A general algorithm to compute a matrix symmetrizer is obtained. A new multiple-modulus residue arithmetic called floating-point modular arithmetic is described and implemented on the algorithm to compute an error-free matrix symmetrizer.
Resumo:
A real or a complex symmetric matrix is defined here as an equivalent symmetric matrix for a real nonsymmetric matrix if both have the same eigenvalues. An equivalent symmetric matrix is useful in computing the eigenvalues of a real nonsymmetric matrix. A procedure to compute equivalent symmetric matrices and its mathematical foundation are presented.
Resumo:
The management and coordination of business-process collaboration experiences changes because of globalization, specialization, and innovation. Service-oriented computing (SOC) is a means towards businessprocess automation and recently, many industry standards emerged to become part of the service-oriented architecture (SOA) stack. In a globalized world, organizations face new challenges for setting up and carrying out collaborations in semi-automating ecosystems for business services. For being efficient and effective, many companies express their services electronically in what we term business-process as a service (BPaaS). Companies then source BPaaS on the fly from third parties if they are not able to create all service-value inhouse because of reasons such as lack of reasoures, lack of know-how, cost- and time-reduction needs. Thus, a need emerges for BPaaS-HUBs that not only store service offers and requests together with information about their issuing organizations and assigned owners, but that also allow an evaluation of trust and reputation in an anonymized electronic service marketplace. In this paper, we analyze the requirements, design architecture and system behavior of such a BPaaS-HUB to enable a fast setup and enactment of business-process collaboration. Moving into a cloud-computing setting, the results of this paper allow system designers to quickly evaluate which services they need for instantiationg the BPaaS-HUB architecture. Furthermore, the results also show what the protocol of a backbone service bus is that allows a communication between services that implement the BPaaS-HUB. Finally, the paper analyzes where an instantiation must assign additional computing resources vor the avoidance of performance bottlenecks.
Resumo:
A Geodesic Constant Method (GCM) is outlined which provides a common approach to ray tracing on quadric cylinders in general, and yields all the surface ray-geometric parameters required in the UTD mutual coupling analysis of conformal antenna arrays in the closed form. The approach permits the incorporation of a shaping parameter which permits the modeling of quadric cylindrical surfaces of desired sharpness/flatness with a common set of equations. The mutual admittance between the slots on a general parabolic cylinder is obtained as an illustration of the applicability of the GCM.
Resumo:
We address the problem of computing the level-crossings of an analog signal from samples measured on a uniform grid. Such a problem is important, for example, in multilevel analog-to-digital (A/D) converters. The first operation in such sampling modalities is a comparator, which gives rise to a bilevel waveform. Since bilevel signals are not bandlimited, measuring the level-crossing times exactly becomes impractical within the conventional framework of Shannon sampling. In this paper, we propose a novel sub-Nyquist sampling technique for making measurements on a uniform grid and thereby for exactly computing the level-crossing times from those samples. The computational complexity of the technique is low and comprises simple arithmetic operations. We also present a finite-rate-of-innovation sampling perspective of the proposed approach and also show how exponential splines fit in naturally into the proposed sampling framework. We also discuss some concrete practical applications of the sampling technique.
Resumo:
Floquet analysis is widely used for small-order systems (say, order M < 100) to find trim results of control inputs and periodic responses, and stability results of damping levels and frequencies, Presently, however, it is practical neither for design applications nor for comprehensive analysis models that lead to large systems (M > 100); the run time on a sequential computer is simply prohibitive, Accordingly, a massively parallel Floquet analysis is developed with emphasis on large systems, and it is implemented on two SIMD or single-instruction, multiple-data computers with 4096 and 8192 processors, The focus of this development is a parallel shooting method with damped Newton iteration to generate trim results; the Floquet transition matrix (FTM) comes out as a byproduct, The eigenvalues and eigenvectors of the FTM are computed by a parallel QR method, and thereby stability results are generated, For illustration, flap and flap-lag stability of isolated rotors are treated by the parallel analysis and by a corresponding sequential analysis with the conventional shooting and QR methods; linear quasisteady airfoil aerodynamics and a finite-state three-dimensional wake model are used, Computational reliability is quantified by the condition numbers of the Jacobian matrices in Newton iteration, the condition numbers of the eigenvalues and the residual errors of the eigenpairs, and reliability figures are comparable in both the parallel and sequential analyses, Compared to the sequential analysis, the parallel analysis reduces the run time of large systems dramatically, and the reduction increases with increasing system order; this finding offers considerable promise for design and comprehensive-analysis applications.
Resumo:
In this paper, we have developed a method to compute fractal dimension (FD) of discrete time signals, in the time domain, by modifying the box-counting method. The size of the box is dependent on the sampling frequency of the signal. The number of boxes required to completely cover the signal are obtained at multiple time resolutions. The time resolutions are made coarse by decimating the signal. The loglog plot of total number of boxes required to cover the curve versus size of the box used appears to be a straight line, whose slope is taken as an estimate of FD of the signal. The results are provided to demonstrate the performance of the proposed method using parametric fractal signals. The estimation accuracy of the method is compared with that of Katz, Sevcik, and Higuchi methods. In ddition, some properties of the FD are discussed.
Resumo:
Biomedical engineering solutions like surgical simulators need High Performance Computing (HPC) to achieve real-time performance. Graphics Processing Units (GPUs) offer HPC capabilities at low cost and low power consumption. In this work, it is demonstrated that a liver which is discretized by about 2500 finite element nodes, can be graphically simulated in realtime, by making use of a GPU. Present work takes into consideration the time needed for the data transfer from CPU to GPU and back from GPU to CPU. Although behaviour of liver is very complicated, present computer simulation assumes linear elastostatics. One needs to use the commercial software ANSYS to obtain the global stiffness matrix of the liver. Results show that GPUs are useful for the real-time graphical simulation of liver, which in turn is needed in simulators that are used for training surgeons in laparoscopic surgery. Although the computer simulation should involve rendering also, neither rendering, nor the time needed for rendering and displaying the liver on a screen, is considered in the present work. The present work is just a demonstration of a concept; the concept is not really implemented and validated. Future work is to develop software which can accomplish real-time and very realistic graphical simulation of liver, with rendered image of liver on the screen changing in real-time according to the position of the surgical tool tip approximated as the mouse cursor in 3D.
Resumo:
In this work, we evaluate performance of a real-world image processing application that uses a cross-correlation algorithm to compare a given image with a reference one. The algorithm processes individual images represented as 2-dimensional matrices of single-precision floating-point values using O(n4) operations involving dot-products and additions. We implement this algorithm on a nVidia GTX 285 GPU using CUDA, and also parallelize it for the Intel Xeon (Nehalem) and IBM Power7 processors, using both manual and automatic techniques. Pthreads and OpenMP with SSE and VSX vector intrinsics are used for the manually parallelized version, while a state-of-the-art optimization framework based on the polyhedral model is used for automatic compiler parallelization and optimization. The performance of this algorithm on the nVidia GPU suffers from: (1) a smaller shared memory, (2) unaligned device memory access patterns, (3) expensive atomic operations, and (4) weaker single-thread performance. On commodity multi-core processors, the application dataset is small enough to fit in caches, and when parallelized using a combination of task and short-vector data parallelism (via SSE/VSX) or through fully automatic optimization from the compiler, the application matches or beats the performance of the GPU version. The primary reasons for better multi-core performance include larger and faster caches, higher clock frequency, higher on-chip memory bandwidth, and better compiler optimization and support for parallelization. The best performing versions on the Power7, Nehalem, and GTX 285 run in 1.02s, 1.82s, and 1.75s, respectively. These results conclusively demonstrate that, under certain conditions, it is possible for a FLOP-intensive structured application running on a multi-core processor to match or even beat the performance of an equivalent GPU version.
Resumo:
MATLAB is an array language, initially popular for rapid prototyping, but is now being increasingly used to develop production code for numerical and scientific applications. Typical MATLAB programs have abundant data parallelism. These programs also have control flow dominated scalar regions that have an impact on the program's execution time. Today's computer systems have tremendous computing power in the form of traditional CPU cores and throughput oriented accelerators such as graphics processing units(GPUs). Thus, an approach that maps the control flow dominated regions to the CPU and the data parallel regions to the GPU can significantly improve program performance. In this paper, we present the design and implementation of MEGHA, a compiler that automatically compiles MATLAB programs to enable synergistic execution on heterogeneous processors. Our solution is fully automated and does not require programmer input for identifying data parallel regions. We propose a set of compiler optimizations tailored for MATLAB. Our compiler identifies data parallel regions of the program and composes them into kernels. The problem of combining statements into kernels is formulated as a constrained graph clustering problem. Heuristics are presented to map identified kernels to either the CPU or GPU so that kernel execution on the CPU and the GPU happens synergistically and the amount of data transfer needed is minimized. In order to ensure required data movement for dependencies across basic blocks, we propose a data flow analysis and edge splitting strategy. Thus our compiler automatically handles composition of kernels, mapping of kernels to CPU and GPU, scheduling and insertion of required data transfer. The proposed compiler was implemented and experimental evaluation using a set of MATLAB benchmarks shows that our approach achieves a geometric mean speedup of 19.8X for data parallel benchmarks over native execution of MATLAB.