862 resultados para bit-parallel
Resumo:
This paper presents the design of a high-speed coprocessor for Elliptic Curve Cryptography over binary Galois Field (ECC- GF(2m)). The purpose of our coprocessor is to accelerate the scalar multiplication performed over elliptic curve points represented by affine coordinates in polynomial basis. Our method consists of using elliptic curve parameters over GF(2163) in accordance with international security requirements to implement a bit-parallel coprocessor on field-programmable gate-array (FPGA). Our coprocessor performs modular inversion by using a process based on the Stein's algorithm. Results are presented and compared to results of other related works. We conclude that our coprocessor is suitable for comparing with any other ECC-hardware proposal, since its speed is comparable to projective coordinate designs.
Resumo:
This paper describes new improvements for BB-MaxClique (San Segundo et al. in Comput Oper Resour 38(2):571–581, 2011 ), a leading maximum clique algorithm which uses bit strings to efficiently compute basic operations during search by bit masking. Improvements include a recently described recoloring strategy in Tomita et al. (Proceedings of the 4th International Workshop on Algorithms and Computation. Lecture Notes in Computer Science, vol 5942. Springer, Berlin, pp 191–203, 2010 ), which is now integrated in the bit string framework, as well as different optimization strategies for fast bit scanning. Reported results over DIMACS and random graphs show that the new variants improve over previous BB-MaxClique for a vast majority of cases. It is also established that recoloring is mainly useful for graphs with high densities.
Resumo:
La multiplication dans le corps de Galois à 2^m éléments (i.e. GF(2^m)) est une opérations très importante pour les applications de la théorie des correcteurs et de la cryptographie. Dans ce mémoire, nous nous intéressons aux réalisations parallèles de multiplicateurs dans GF(2^m) lorsque ce dernier est généré par des trinômes irréductibles. Notre point de départ est le multiplicateur de Montgomery qui calcule A(x)B(x)x^(-u) efficacement, étant donné A(x), B(x) in GF(2^m) pour u choisi judicieusement. Nous étudions ensuite l'algorithme diviser pour régner PCHS qui permet de partitionner les multiplicandes d'un produit dans GF(2^m) lorsque m est impair. Nous l'appliquons pour la partitionnement de A(x) et de B(x) dans la multiplication de Montgomery A(x)B(x)x^(-u) pour GF(2^m) même si m est pair. Basé sur cette nouvelle approche, nous construisons un multiplicateur dans GF(2^m) généré par des trinôme irréductibles. Une nouvelle astuce de réutilisation des résultats intermédiaires nous permet d'éliminer plusieurs portes XOR redondantes. Les complexités de temps (i.e. le délais) et d'espace (i.e. le nombre de portes logiques) du nouveau multiplicateur sont ensuite analysées: 1. Le nouveau multiplicateur demande environ 25% moins de portes logiques que les multiplicateurs de Montgomery et de Mastrovito lorsque GF(2^m) est généré par des trinômes irréductible et m est suffisamment grand. Le nombre de portes du nouveau multiplicateur est presque identique à celui du multiplicateur de Karatsuba proposé par Elia. 2. Le délai de calcul du nouveau multiplicateur excède celui des meilleurs multiplicateurs d'au plus deux évaluations de portes XOR. 3. Nous determinons le délai et le nombre de portes logiques du nouveau multiplicateur sur les deux corps de Galois recommandés par le National Institute of Standards and Technology (NIST). Nous montrons que notre multiplicateurs contient 15% moins de portes logiques que les multiplicateurs de Montgomery et de Mastrovito au coût d'un délai d'au plus une porte XOR supplémentaire. De plus, notre multiplicateur a un délai d'une porte XOR moindre que celui du multiplicateur d'Elia au coût d'une augmentation de moins de 1% du nombre total de portes logiques.
Resumo:
Efficient hardware implementations of arithmetic operations in the Galois field are highly desirable for several applications, such as coding theory, computer algebra and cryptography. Among these operations, multiplication is of special interest because it is considered the most important building block. Therefore, high-speed algorithms and hardware architectures for computing multiplication are highly required. In this paper, bit-parallel polynomial basis multipliers over the binary field GF(2(m)) generated using type II irreducible pentanomials are considered. The multiplier here presented has the lowest time complexity known to date for similar multipliers based on this type of irreducible pentanomials.
Resumo:
This paper describes JERIM-320, a new 320-bit hash function used for ensuring message integrity and details a comparison with popular hash functions of similar design. JERIM-320 and FORK -256 operate on four parallel lines of message processing while RIPEMD-320 operates on two parallel lines. Popular hash functions like MD5 and SHA-1 use serial successive iteration for designing compression functions and hence are less secure. The parallel branches help JERIM-320 to achieve higher level of security using multiple iterations and processing on the message blocks. The focus of this work is to prove the ability of JERIM 320 in ensuring the integrity of messages to a higher degree to suit the fast growing internet applications
Resumo:
For single-user MIMO communication with uncoded and coded QAM signals, we propose bit and power loading schemes that rely only on channel distribution information at the transmitter. To that end, we develop the relationship between the average bit error probability at the output of a ZF linear receiver and the bit rates and powers allocated at the transmitter. This relationship, and the fact that a ZF receiver decouples the MIMO parallel channels, allow leveraging bit loading algorithms already existing in the literature. We solve dual bit rate maximization and power minimization problems and present performance resultsthat illustrate the gains of the proposed scheme with respect toa non-optimized transmission.
Resumo:
A parallel pipelined array of cells suitable for realtime computation of histograms is proposed. The cell architecture builds on previous work to now allow operating on a stream of data at 1 pixel per clock cycle. This new cell is more suitable for interfacing to camera sensors or to microprocessors of 8-bit data buses which are common in consumer digital cameras. Arrays using the new proposed cells are obtained via C-slow retiming techniques and can be clocked at a 65% faster frequency than previous arrays. This achieves over 80% of the performance of two-pixel per clock cycle parallel pipelined arrays.
Resumo:
A parallel pipelined array of cells suitable for real-time computation of histograms is proposed. The cell architecture builds on previous work obtained via C-slow retiming techniques and can be clocked at 65 percent faster frequency than previous arrays. The new arrays can be exploited for higher throughput particularly when dual data rate sampling techniques are used to operate on single streams of data from image sensors. In this way, the new cell operates on a p-bit data bus which is more convenient for interfacing to camera sensors or to microprocessors in consumer digital cameras.
Resumo:
We have optimised the atmospheric radiation algorithm of the FAMOUS climate model on several hardware platforms. The optimisation involved translating the Fortran code to C and restructuring the algorithm around the computation of a single air column. Instead of the existing MPI-based domain decomposition, we used a task queue and a thread pool to schedule the computation of individual columns on the available processors. Finally, four air columns are packed together in a single data structure and computed simultaneously using Single Instruction Multiple Data operations. The modified algorithm runs more than 50 times faster on the CELL’s Synergistic Processing Elements than on its main PowerPC processing element. On Intel-compatible processors, the new radiation code runs 4 times faster. On the tested graphics processor, using OpenCL, we find a speed-up of more than 2.5 times as compared to the original code on the main CPU. Because the radiation code takes more than 60% of the total CPU time, FAMOUS executes more than twice as fast. Our version of the algorithm returns bit-wise identical results, which demonstrates the robustness of our approach. We estimate that this project required around two and a half man-years of work.
Resumo:
The time to process each of W/B processing blocks of a median calculation method on a set of N W-bit integers is improved here by a factor of three compared to the literature. Parallelism uncovered in blocks containing B-bit slices are exploited by independent accumulative parallel counters so that the median is calculated faster than any known previous method for any N, W values. The improvements to the method are discussed in the context of calculating the median for a moving set of N integers for which a pipelined architecture is developed. An extra benefit of smaller area for the architecture is also reported.
Resumo:
A parallel formulation for the simulation of a branch prediction algorithm is presented. This parallel formulation identifies independent tasks in the algorithm which can be executed concurrently. The parallel implementation is based on the multithreading model and two parallel programming platforms: pthreads and Cilk++. Improvement in execution performance by up to 7 times is observed for a generic 2-bit predictor in a 12-core multiprocessor system.
Resumo:
This paper describes a fast integer sorting algorithm, herein referred as Bit-index sort, which is a non-comparison sorting algorithm for partial per-mutations, with linear complexity order in execution time. Bit-index sort uses a bit-array to classify input sequences of distinct integers, and exploits built-in bit functions in C compilers supported by machine hardware to retrieve the ordered output sequence. Results show that Bit-index sort outperforms in execution time to quicksort and counting sort algorithms. A parallel approach for Bit-index sort using two simultaneous threads is included, which obtains speedups up to 1.6.
Resumo:
Orthogonal Frequency-Division Multiplexing (OFDM) has been proved to be a promising technology that enables the transmission of higher data rate. Multicarrier Code-Division Multiple Access (MC-CDMA) is a transmission technique which combines the advantages of both OFDM and Code-Division Multiplexing Access (CDMA), so as to allow high transmission rates over severe time-dispersive multi-path channels without the need of a complex receiver implementation. Also MC-CDMA exploits frequency diversity via the different subcarriers, and therefore allows the high code rates systems to achieve good Bit Error Rate (BER) performances. Furthermore, the spreading in the frequency domain makes the time synchronization requirement much lower than traditional direct sequence CDMA schemes. There are still some problems when we use MC-CDMA. One is the high Peak-to-Average Power Ratio (PAPR) of the transmit signal. High PAPR leads to nonlinear distortion of the amplifier and results in inter-carrier self-interference plus out-of-band radiation. On the other hand, suppressing the Multiple Access Interference (MAI) is another crucial problem in the MC-CDMA system. Imperfect cross-correlation characteristics of the spreading codes and the multipath fading destroy the orthogonality among the users, and then cause MAI, which produces serious BER degradation in the system. Moreover, in uplink system the received signals at a base station are always asynchronous. This also destroys the orthogonality among the users, and hence, generates MAI which degrades the system performance. Besides those two problems, the interference should always be considered seriously for any communication system. In this dissertation, we design a novel MC-CDMA system, which has low PAPR and mitigated MAI. The new Semi-blind channel estimation and multi-user data detection based on Parallel Interference Cancellation (PIC) have been applied in the system. The Low Density Parity Codes (LDPC) has also been introduced into the system to improve the performance. Different interference models are analyzed in multi-carrier communication systems and then the effective interference suppression for MC-CDMA systems is employed in this dissertation. The experimental results indicate that our system not only significantly reduces the PAPR and MAI but also effectively suppresses the outside interference with low complexity. Finally, we present a practical cognitive application of the proposed system over the software defined radio platform.