991 resultados para parallel architecture
Resumo:
In this paper we investigate various algorithms for performing Fast Fourier Transformation (FFT)/Inverse Fast Fourier Transformation (IFFT), and proper techniques for maximizing the FFT/IFFT execution speed, such as pipelining or parallel processing, and use of memory structures with pre-computed values (look up tables -LUT) or other dedicated hardware components (usually multipliers). Furthermore, we discuss the optimal hardware architectures that best apply to various FFT/IFFT algorithms, along with their abilities to exploit parallel processing with minimal data dependences of the FFT/IFFT calculations. An interesting approach that is also considered in this paper is the application of the integrated processing-in-memory Intelligent RAM (IRAM) chip to high speed FFT/IFFT computing. The results of the assessment study emphasize that the execution speed of the FFT/IFFT algorithms is tightly connected to the capabilities of the FFT/IFFT hardware to support the provided parallelism of the given algorithm. Therefore, we suggest that the basic Discrete Fourier Transform (DFT)/Inverse Discrete Fourier Transform (IDFT) can also provide high performances, by utilizing a specialized FFT/IFFT hardware architecture that can exploit the provided parallelism of the DFT/IDF operations. The proposed improvements include simplified multiplications over symbols given in polar coordinate system, using sinе and cosine look up tables, and an approach for performing parallel addition of N input symbols.
Resumo:
In this paper we investigate various algorithms for performing Fast Fourier Transformation (FFT)/Inverse Fast Fourier Transformation (IFFT), and proper techniquesfor maximizing the FFT/IFFT execution speed, such as pipelining or parallel processing, and use of memory structures with pre-computed values (look up tables -LUT) or other dedicated hardware components (usually multipliers). Furthermore, we discuss the optimal hardware architectures that best apply to various FFT/IFFT algorithms, along with their abilities to exploit parallel processing with minimal data dependences of the FFT/IFFT calculations. An interesting approach that is also considered in this paper is the application of the integrated processing-in-memory Intelligent RAM (IRAM) chip to high speed FFT/IFFT computing. The results of the assessment study emphasize that the execution speed of the FFT/IFFT algorithms is tightly connected to the capabilities of the FFT/IFFT hardware to support the provided parallelism of the given algorithm. Therefore, we suggest that the basic Discrete Fourier Transform (DFT)/Inverse Discrete Fourier Transform (IDFT) can also provide high performances, by utilizing a specialized FFT/IFFT hardware architecture that can exploit the provided parallelism of the DFT/IDF operations. The proposed improvements include simplified multiplications over symbols given in polar coordinate system, using sinе and cosine look up tables,and an approach for performing parallel addition of N input symbols.
Resumo:
Magdeburg, Univ., Fak. für Naturwiss., Diss., 2014
Resumo:
The modern computer systems that are in use nowadays are mostly processor-dominant, which means that their memory is treated as a slave element that has one major task – to serve execution units data requirements. This organization is based on the classical Von Neumann's computer model, proposed seven decades ago in the 1950ties. This model suffers from a substantial processor-memory bottleneck, because of the huge disparity between the processor and memory working speeds. In order to solve this problem, in this paper we propose a novel architecture and organization of processors and computers that attempts to provide stronger match between the processing and memory elements in the system. The proposed model utilizes a memory-centric architecture, wherein the execution hardware is added to the memory code blocks, allowing them to perform instructions scheduling and execution, management of data requests and responses, and direct communication with the data memory blocks without using registers. This organization allows concurrent execution of all threads, processes or program segments that fit in the memory at a given time. Therefore, in this paper we describe several possibilities for organizing the proposed memory-centric system with multiple data and logicmemory merged blocks, by utilizing a high-speed interconnection switching network.
Resumo:
2
Resumo:
3
Resumo:
This paper shows how a high level matrix programming language may be used to perform Monte Carlo simulation, bootstrapping, estimation by maximum likelihood and GMM, and kernel regression in parallel on symmetric multiprocessor computers or clusters of workstations. The implementation of parallelization is done in a way such that an investigator may use the programs without any knowledge of parallel programming. A bootable CD that allows rapid creation of a cluster for parallel computing is introduced. Examples show that parallelization can lead to important reductions in computational time. Detailed discussion of how the Monte Carlo problem was parallelized is included as an example for learning to write parallel programs for Octave.
Resumo:
We have used massively parallel signature sequencing (MPSS) to sample the transcriptomes of 32 normal human tissues to an unprecedented depth, thus documenting the patterns of expression of almost 20,000 genes with high sensitivity and specificity. The data confirm the widely held belief that differences in gene expression between cell and tissue types are largely determined by transcripts derived from a limited number of tissue-specific genes, rather than by combinations of more promiscuously expressed genes. Expression of a little more than half of all known human genes seems to account for both the common requirements and the specific functions of the tissues sampled. A classification of tissues based on patterns of gene expression largely reproduces classifications based on anatomical and biochemical properties. The unbiased sampling of the human transcriptome achieved by MPSS supports the idea that most human genes have been mapped, if not functionally characterized. This data set should prove useful for the identification of tissue-specific genes, for the study of global changes induced by pathological conditions, and for the definition of a minimal set of genes necessary for basic cell maintenance. The data are available on the Web at http://mpss.licr.org and http://sgb.lynxgen.com.
Resumo:
Architectural design and deployment of Peer-to-Peer Video-on-Demand (P2PVoD) systems which support VCR functionalities is attracting the interest of an increasing number of research groups within the scientific community; especially due to the intrinsic characteristics of such systems and the benefits that peers could provide at reducing the server load. This work focuses on the performance analysis of a P2P-VoD system considering user behaviors obtained from real traces together with other synthetic user patterns. The experiments performed show that it is feasible to achieve a performance close to the best possible. Future work will consider monitoring the physical characteristics of the network in order to improve the design of different aspects of a VoD system.
Resumo:
Fault tolerance has become a major issue for computer and software engineers because the occurrence of faults increases the cost of using a parallel computer. RADIC is the fault tolerance architecture for message passing systems which is transparent, decentralized, flexible and scalable. This master thesis presents the methodology used to implement the RADIC architecture over Open MPI, a well-know large-used message passing library. This implementation kept the RADIC architecture characteristics. In order to validate the implementation we have executed a synthetic ping program, besides, to evaluate the implementation performance we have used the NAS Parallel Benchmarks. The results prove that the RADIC architecture performance depends on the communication pattern of the parallel application which is running. Furthermore, our implementation proves that the RADIC architecture could be implemented over an existent message passing library.
Resumo:
El presente trabajo expone los entornos de desarrollo de aplicaciones paralelas en sistemas distribuidos. Se ha dedicado especial atención a los entornos gráficos, como plataforma de acceso al cluster. Dada la cantidad de alternativas a seleccionar en este tipo de entornos, como puede ser la parametrización de las aplicaciones, especificación de la arquitectura a utilizar de los nodos, caracterización de la carga, políticas de planificación, etc, hace que este trabajo pueda servir de punto inicial de partida a usuarios que deseen iniciarse en este tipo de entornos.
Resumo:
CISNE es un sistema de cómputo en paralelo del Departamento de Arquitectura de Computadores y Sistemas Operativos (DACSO). Para poder implementar políticas de ordenacción de colas y selección de trabajos, este sistema necesita predecir el tiempo de ejecución de las aplicaciones. Con este trabajo se pretende proveer al sistema CISNE de un método para predecir el tiempo de ejecución basado en un histórico donde se almacenarán todos los datos sobre las ejecuciones.