218 resultados para 291605 Processor Architectures
Resumo:
Even research models of helicopter dynamics often lead to a large number of equations of motion with periodic coefficients; and Floquet theory is a widely used mathematical tool for dynamic analysis. Presently, three approaches are used in generating the equations of motion. These are (1) general-purpose symbolic processors such as REDUCE and MACSYMA, (2) a special-purpose symbolic processor, DEHIM (Dynamic Equations for Helicopter Interpretive Models), and (3) completely numerical approaches. In this paper, comparative aspects of the first two purely algebraic approaches are studied by applying REDUCE and DEHIM to the same set of problems. These problems range from a linear model with one degree of freedom to a mildly non-linear multi-bladed rotor model with several degrees of freedom. Further, computational issues in applying Floquet theory are also studied, which refer to (1) the equilibrium solution for periodic forced response together with the transition matrix for perturbations about that response and (2) a small number of eigenvalues and eigenvectors of the unsymmetric transition matrix. The study showed the following: (1) compared to REDUCE, DEHIM is far more portable and economical, but it is also less user-friendly, particularly during learning phases; (2) the problems of finding the periodic response and eigenvalues are well conditioned.
Resumo:
In this paper we propose a novel technique to model and ana¿ lyze the performability of parallel and distributed architectures using GSPN-reward models.
Resumo:
The design of a dual-DSP microprocessor system and its application for parallel FFT and two-dimensional convolution are explained. The system is based on a master-salve configuration. Two ADSP-2101s are configured as slave processors and a PC/AT serves as the master. The master serves as a control processor to transfer the program code and data to the DSPs. The system architecture and the algorithms for the two applications, viz. FFT and two-dimensional convolutions, are discussed.
Analyzing Cache Performance Bottlenecks of STM Applications and addressing them with Compiler's help
Resumo:
Software transactional memory (STM) is a promising programming paradigm for shared memory multithreaded programs as an alternative to traditional lock based synchronization. However adoption of STM in mainstream software has been quite low due to its considerable overheads and its poor cache/memory performance. In this paper, we perform a detailed study of the cache behavior of STM applications and quantify the impact of different STM factors on the cache misses experienced by the applications. Based on our analysis, we propose a compiler driven Lock-Data Colocation (LDC), targeted at reducing the cache overheads on STM. We show that LDC is effective in improving the cache behavior of STM applications by reducing the dcache miss latency and improving execution time performance.
Resumo:
In this work, we propose a new organization for the last level shared cache of a rnulticore system. Our design is based on the observation that the Next-Use distance, measured in terms of intervening misses between the eviction of a line and its next use, for lines brought in by a given delinquent PC falls within a predictable range of values. We exploit this correlation to improve the performance of shared caches in multi-core architectures by proposing the NUcache organization.
Resumo:
High performance video standards use prediction techniques to achieve high picture quality at low bit rates. The type of prediction decides the bit rates and the image quality. Intra Prediction achieves high video quality with significant reduction in bit rate. This paper present an area optimized architecture for Intra prediction, for H.264 decoding at HDTV resolution with a target of achieving 60 fps. The architecture was validated on Virtex-5 FPGA based platform. The architecture achieves a frame rate of 64 fps. The architecture is based on multi-level memory hierarchy to reduce latency and ensure optimum resources utilization. It removes redundancy by reusing same functional blocks across different modes. The proposed architecture uses only 13% of the total LUTs available on the Xilinx FPGA XC5VLX50T.
Resumo:
In this paper we present a cache coherence protocol for multistage interconnection network (MIN)-based multiprocessors with two distinct private caches: private-blocks caches (PCache) containing blocks private to a process and shared-blocks caches (SCache) containing data accessible by all processes. The architecture is extended by a coherence control bus connecting all shared-block cache controllers. Timing problems due to variable transit delays through the MIN are dealt with by introducing Transient states in the proposed cache coherence protocol. The impact of the coherence protocol on system performance is evaluated through a performance study of three phases. Assuming homogeneity of all nodes, a single-node queuing model (phase 3) is developed to analyze system performance. This model is solved for processor and coherence bus utilizations using the mean value analysis (MVA) technique with shared-blocks steady state probabilities (phase 1) and communication delays (phase 2) as input parameters. The performance of our system is compared to that of a system with an equivalent-sized unified cache and with a multiprocessor implementing a directory-based coherence protocol. System performance measures are verified through simulation.
Resumo:
A number of companies are trying to migrate large monolithic software systems to Service Oriented Architectures. A common approach to do this is to first identify and describe desired services (i.e., create a model), and then to locate portions of code within the existing system that implement the described services. In this paper we describe a detailed case study we undertook to match a model to an open-source business application. We describe the systematic methodology we used, the results of the exercise, as well as several observations that throw light on the nature of this problem. We also suggest and validate heuristics that are likely to be useful in partially automating the process of matching service descriptions to implementations.
Resumo:
The bipolar point spread function (PSF) corresponding to the Wiener filter tor correcting linear-motion-blurred pictures is implemented in a noncoherent optical processor. The following two approaches are taken for this implementation: (1) the PSF is modulated and biased so that the resulting function is non-negative and (2) the PSF is split into its positive and sign-reversed negative parts, and these two parts are dealt with separately. The phase problem associated with arriving at the pupil function from these modified PSFs is solved using both analytical and combined analytical-iterative techniques available in the literature. The designed pupil functions are experimentally implemented, and deblurring in a noncoherent processor is demonstrated. The postprocessing required (i.e., demodulation in the first approach to modulating the PSF and intensity subtraction in the second approach) are carried out either in a coherent processor or with the help of a PC-based vision system. The deblurred outputs are presented.
Resumo:
We describe a compiler for the Flat Concurrent Prolog language on a message passing multiprocessor architecture. This compiler permits symbolic and declarative programming in the syntax of Guarded Horn Rules, The implementation has been verified and tested on the 64-node PARAM parallel computer developed by C-DAC (Centre for the Development of Advanced Computing, India), Flat Concurrent Prolog (FCP) is a logic programming language designed for concurrent programming and parallel execution, It is a process oriented language, which embodies dataflow synchronization and guarded-command as its basic control mechanisms. An identical algorithm is executed on every processor in the network, We assume regular network topologies like mesh, ring, etc, Each node has a local memory, The algorithm comprises of two important parts: reduction and communication, The most difficult task is to integrate the solutions of problems that arise in the implementation in a coherent and efficient manner. We have tested the efficacy of the compiler on various benchmark problems of the ICOT project that have been reported in the recent book by Evan Tick, These problems include Quicksort, 8-queens, and Prime Number Generation, The results of the preliminary tests are favourable, We are currently examining issues like indexing and load balancing to further optimize our compiler.
Resumo:
This paper reports new results concerning the capabilities of a family of service disciplines aimed at providing per-connection end-to-end delay (and throughput) guarantees in high-speed networks. This family consists of the class of rate-controlled service disciplines, in which traffic from a connection is reshaped to conform to specific traffic characteristics, at every hop on its path. When used together with a scheduling policy at each node, this reshaping enables the network to provide end-to-end delay guarantees to individual connections. The main advantages of this family of service disciplines are their implementation simplicity and flexibility. On the other hand, because the delay guarantees provided are based on summing worst case delays at each node, it has also been argued that the resulting bounds are very conservative which may more than offset the benefits. In particular, other service disciplines such as those based on Fair Queueing or Generalized Processor Sharing (GPS), have been shown to provide much tighter delay bounds. As a result, these disciplines, although more complex from an implementation point-of-view, have been considered for the purpose of providing end-to-end guarantees in high-speed networks. In this paper, we show that through ''proper'' selection of the reshaping to which we subject the traffic of a connection, the penalty incurred by computing end-to-end delay bounds based on worst cases at each node can be alleviated. Specifically, we show how rate-controlled service disciplines can be designed to outperform the Rate Proportional Processor Sharing (RPPS) service discipline. Based on these findings, we believe that rate-controlled service disciplines provide a very powerful and practical solution to the problem of providing end-to-end guarantees in high-speed networks.
Resumo:
The paper examines the suitability of the generalized data rule in training artificial neural networks (ANN) for damage identification in structures. Several multilayer perceptron architectures are investigated for a typical bridge truss structure with simulated damage stares generated randomly. The training samples have been generated in terms of measurable structural parameters (displacements and strains) at suitable selected locations in the structure. Issues related to the performance of the network with reference to hidden layers and hidden. neurons are examined. Some heuristics are proposed for the design of neural networks for damage identification in structures. These are further supported by an investigation conducted on five other bridge truss configurations.
Resumo:
ASICs offer the best realization of DSP algorithms in terms of performance, but the cost is prohibitive, especially when the volumes involved are low. However, if the architecture synthesis trajectory for such algorithms is such that the target architecture can be identified as an interconnection of elementary parameterized computational structures, then it is possible to attain a close match, both in terms of performance and power with respect to an ASIC, for any algorithmic parameters of the given algorithm. Such an architecture is weakly programmable (configurable) and can be viewed as an application specific integrated processor (ASIP). In this work, we present a methodology to synthesize ASIPs for DSP algorithms. (C) 1999 Elsevier Science B.V. All rights reserved.
Resumo:
Simulation is an important means of evaluating new microarchitectures. With the invention of multi-core (CMP) platforms, simulators are becoming larger and more complex. However, with the availability of CMPs with larger caches and higher operating frequency, the wall clock time required for simulating an application has become comparatively shorter. Reducing this simulation time further is a great challenge, especially in the case of multi-threaded workload due to indeterminacy introduced due to simultaneously executing various threads. In this paper, we propose a technique for speeding multi-core simulation. The model of the processor core and cache are replaced with functional models, to achieve speedup. A timed Petri net model is used to estimate the execution time of the processor and the memory access latencies are estimated using hit/miss information obtained from the functional model of the cache. This model can be used to predict performance of data parallel applications or multiprogramming workload on CMP platform with various cache hierarchies and shared bus interconnect. The error in estimation of the execution time of an application is within 6%. The speedup achieved ranges between an average of 2x--4x over the cycle accurate simulator.
Resumo:
In this paper, we consider the application of belief propagation (BP) to achieve near-optimal signal detection in large multiple-input multiple-output (MIMO) systems at low complexities. Large-MIMO architectures based on spatial multiplexing (V-BLAST) as well as non-orthogonal space-time block codes(STBC) from cyclic division algebra (CDA) are considered. We adopt graphical models based on Markov random fields (MRF) and factor graphs (FG). In the MRF based approach, we use pairwise compatibility functions although the graphical models of MIMO systems are fully/densely connected. In the FG approach, we employ a Gaussian approximation (GA) of the multi-antenna interference, which significantly reduces the complexity while achieving very good performance for large dimensions. We show that i) both MRF and FG based BP approaches exhibit large-system behavior, where increasingly closer to optimal performance is achieved with increasing number of dimensions, and ii) damping of messages/beliefs significantly improves the bit error performance.