999 resultados para Program processors
Resumo:
This paper presents a single precision floating point arithmetic unit with support for multiplication, addition, fused multiply-add, reciprocal, square-root and inverse squareroot with high-performance and low resource usage. The design uses a piecewise 2nd order polynomial approximation to implement reciprocal, square-root and inverse square-root. The unit can be configured with any number of operations and is capable to calculate any function with a throughput of one operation per cycle. The floatingpoint multiplier of the unit is also used to implement the polynomial approximation and the fused multiply-add operation. We have compared our implementation with other state-of-the-art proposals, including the Xilinx Core-Gen operators, and conclude that the approach has a high relative performance/area efficiency. © 2014 Technical University of Munich (TUM).
Resumo:
Systems based on artificial neural networks have high computational rates due to the use of a massive number of simple processing elements and the high degree of connectivity between these elements. Neural networks with feedback connections provide a computing model capable of solving a large class of optimization problems. This paper presents a novel approach for solving dynamic programming problems using artificial neural networks. More specifically, a modified Hopfield network is developed and its internal parameters are computed using the valid-subspace technique. These parameters guarantee the convergence of the network to the equilibrium points which represent solutions (not necessarily optimal) for the dynamic programming problem. Simulated examples are presented and compared with other neural networks. The results demonstrate that proposed method gives a significant improvement.
Resumo:
A digital-desk pilot program, named One Laptop Per Child (OPLC), in Brazil uses a unique display design to provide an interactive interface developed to enhance education and minimize ergonomic concerns. The one-to-one computer strategy as proposed by Nicholas Negroponte is a way of circumventing the tragedy of the locked computer lab because it gives children full access to computers anytime. The OLPC program has focused on a solution that minimizes power consumption, which also limits the display's maximum size and processor performance because the LCD backlights are responsible for a significant part of the power consumption in laptops. The government has also developed a new type of low-cost tablet that is based on a resistive principle. High transparencies can be obtained in the 90% range in the tablet, while robustness is guaranteed by the outstanding tribological characteristics of Sn02 on glass.
Resumo:
SAFT techniques are based on the sequential activation, in emission and reception, of the array elements and the post-processing of all the received signals to compose the image. Thus, the image generation can be divided into two stages: (1) the excitation and acquisition stage, where the signals received by each element or group of elements are stored; and (2) the beamforming stage, where the signals are combined together to obtain the image pixels. The use of Graphics Processing Units (GPUs), which are programmable devices with a high level of parallelism, can accelerate the computations of the beamforming process, that usually includes different functions such as dynamic focusing, band-pass filtering, spatial filtering or envelope detection. This work shows that using GPU technology can accelerate, in more than one order of magnitude with respect to CPU implementations, the beamforming and post-processing algorithms in SAFT imaging. ©2009 IEEE.
Resumo:
In this article we explore the NVIDIA graphical processing units (GPU) computational power in cryptography using CUDA (Compute Unified Device Architecture) technology. CUDA makes the general purpose computing easy using the parallel processing presents in GPUs. To do this, the NVIDIA GPUs architectures and CUDA are presented, besides cryptography concepts. Furthermore, we do the comparison between the versions executed in CPU with the parallel version of the cryptography algorithms Advanced Encryption Standard (AES) and Message-digest Algorithm 5 (MD5) wrote in CUDA. © 2011 AISTI.
Resumo:
The modern computer systems that are in use nowadays are mostly processor-dominant, which means that their memory is treated as a slave element that has one major task – to serve execution units data requirements. This organization is based on the classical Von Neumann's computer model, proposed seven decades ago in the 1950ties. This model suffers from a substantial processor-memory bottleneck, because of the huge disparity between the processor and memory working speeds. In order to solve this problem, in this paper we propose a novel architecture and organization of processors and computers that attempts to provide stronger match between the processing and memory elements in the system. The proposed model utilizes a memory-centric architecture, wherein the execution hardware is added to the memory code blocks, allowing them to perform instructions scheduling and execution, management of data requests and responses, and direct communication with the data memory blocks without using registers. This organization allows concurrent execution of all threads, processes or program segments that fit in the memory at a given time. Therefore, in this paper we describe several possibilities for organizing the proposed memory-centric system with multiple data and logicmemory merged blocks, by utilizing a high-speed interconnection switching network.
Resumo:
With security and surveillance, there is an increasing need to process image data efficiently and effectively either at source or in a large data network. Whilst a Field-Programmable Gate Array (FPGA) has been seen as a key technology for enabling this, the design process has been viewed as problematic in terms of the time and effort needed for implementation and verification. The work here proposes a different approach of using optimized FPGA-based soft-core processors which allows the user to exploit the task and data level parallelism to achieve the quality of dedicated FPGA implementations whilst reducing design time. The paper also reports some preliminary
progress on the design flow to program the structure. An implementation for a Histogram of Gradients algorithm is also reported which shows that a performance of 328 fps can be achieved with this design approach, whilst avoiding the long design time, verification and debugging steps associated with conventional FPGA implementations.
Resumo:
The difficulties encountered in implementing large scale CM codes on multiprocessor systems are now fairly well understood. Despite the claims of shared memory architecture manufacturers to provide effective parallelizing compilers, these have not proved to be adequate for large or complex programs. Significant programmer effort is usually required to achieve reasonable parallel efficiencies on significant numbers of processors. The paradigm of Single Program Multi Data (SPMD) domain decomposition with message passing, where each processor runs the same code on a subdomain of the problem, communicating through exchange of messages, has for some time been demonstrated to provide the required level of efficiency, scalability, and portability across both shared and distributed memory systems, without the need to re-author the code into a new language or even to support differing message passing implementations. Extension of the methods into three dimensions has been enabled through the engineering of PHYSICA, a framework for supporting 3D, unstructured mesh and continuum mechanics modeling. In PHYSICA, six inspectors are used. Part of the challenge for automation of parallelization is being able to prove the equivalence of inspectors so that they can be merged into as few as possible.
Resumo:
Processors with large numbers of cores are becoming commonplace. In order to utilise the available resources in such systems, the programming paradigm has to move towards increased parallelism. However, increased parallelism does not necessarily lead to better performance. Parallel programming models have to provide not only flexible ways of defining parallel tasks, but also efficient methods to manage the created tasks. Moreover, in a general-purpose system, applications residing in the system compete for the shared resources. Thread and task scheduling in such a multiprogrammed multithreaded environment is a significant challenge. In this thesis, we introduce a new task-based parallel reduction model, called the Glasgow Parallel Reduction Machine (GPRM). Our main objective is to provide high performance while maintaining ease of programming. GPRM supports native parallelism; it provides a modular way of expressing parallel tasks and the communication patterns between them. Compiling a GPRM program results in an Intermediate Representation (IR) containing useful information about tasks, their dependencies, as well as the initial mapping information. This compile-time information helps reduce the overhead of runtime task scheduling and is key to high performance. Generally speaking, the granularity and the number of tasks are major factors in achieving high performance. These factors are even more important in the case of GPRM, as it is highly dependent on tasks, rather than threads. We use three basic benchmarks to provide a detailed comparison of GPRM with Intel OpenMP, Cilk Plus, and Threading Building Blocks (TBB) on the Intel Xeon Phi, and with GNU OpenMP on the Tilera TILEPro64. GPRM shows superior performance in almost all cases, only by controlling the number of tasks. GPRM also provides a low-overhead mechanism, called “Global Sharing”, which improves performance in multiprogramming situations. We use OpenMP, as the most popular model for shared-memory parallel programming as the main GPRM competitor for solving three well-known problems on both platforms: LU factorisation of Sparse Matrices, Image Convolution, and Linked List Processing. We focus on proposing solutions that best fit into the GPRM’s model of execution. GPRM outperforms OpenMP in all cases on the TILEPro64. On the Xeon Phi, our solution for the LU Factorisation results in notable performance improvement for sparse matrices with large numbers of small blocks. We investigate the overhead of GPRM’s task creation and distribution for very short computations using the Image Convolution benchmark. We show that this overhead can be mitigated by combining smaller tasks into larger ones. As a result, GPRM can outperform OpenMP for convolving large 2D matrices on the Xeon Phi. Finally, we demonstrate that our parallel worksharing construct provides an efficient solution for Linked List processing and performs better than OpenMP implementations on the Xeon Phi. The results are very promising, as they verify that our parallel programming framework for manycore processors is flexible and scalable, and can provide high performance without sacrificing productivity.
Resumo:
The big data era has dramatically transformed our lives; however, security incidents such as data breaches can put sensitive data (e.g. photos, identities, genomes) at risk. To protect users' data privacy, there is a growing interest in building secure cloud computing systems, which keep sensitive data inputs hidden, even from computation providers. Conceptually, secure cloud computing systems leverage cryptographic techniques (e.g., secure multiparty computation) and trusted hardware (e.g. secure processors) to instantiate a “secure” abstract machine consisting of a CPU and encrypted memory, so that an adversary cannot learn information through either the computation within the CPU or the data in the memory. Unfortunately, evidence has shown that side channels (e.g. memory accesses, timing, and termination) in such a “secure” abstract machine may potentially leak highly sensitive information, including cryptographic keys that form the root of trust for the secure systems. This thesis broadly expands the investigation of a research direction called trace oblivious computation, where programming language techniques are employed to prevent side channel information leakage. We demonstrate the feasibility of trace oblivious computation, by formalizing and building several systems, including GhostRider, which is a hardware-software co-design to provide a hardware-based trace oblivious computing solution, SCVM, which is an automatic RAM-model secure computation system, and ObliVM, which is a programming framework to facilitate programmers to develop applications. All of these systems enjoy formal security guarantees while demonstrating a better performance than prior systems, by one to several orders of magnitude.
Resumo:
To investigate the effects of a specific protocol of undulatory physical resistance training on maximal strength gains in elderly type 2 diabetics. The study included 48 subjects, aged between 60 and 85 years, of both genders. They were divided into two groups: Untrained Diabetic Elderly (n=19) with those who were not subjected to physical training and Trained Diabetic Elderly (n=29), with those who were subjected to undulatory physical resistance training. The participants were evaluated with several types of resistance training's equipment before and after training protocol, by test of one maximal repetition. The subjects were trained on undulatory resistance three times per week for a period of 16 weeks. The overload used in undulatory resistance training was equivalent to 50% of one maximal repetition and 70% of one maximal repetition, alternating weekly. Statistical analysis revealed significant differences (p<0.05) between pre-test and post-test over a period of 16 weeks. The average gains in strength were 43.20% (knee extension), 65.00% (knee flexion), 27.80% (supine sitting machine), 31.00% (rowing sitting), 43.90% (biceps pulley), and 21.10% (triceps pulley). Undulatory resistance training used with weekly different overloads was effective to provide significant gains in maximum strength in elderly type 2 diabetic individuals.
Resumo:
397
Resumo:
We present a computer program developed for estimating penetrance rates in autosomal dominant diseases by means of family kinship and phenotype information contained within the pedigrees. The program also determines the exact 95% credibility interval for the penetrance estimate. Both executable (PenCalc for Windows) and web versions (PenCalcWeb) of the software are available. The web version enables further calculations, such as heterozygosity probabilities and assessment of offspring risks for all individuals in the pedigrees. Both programs can be accessed and down-loaded freely at the home-page address http://www.ib.usp.br/~otto/software.htm.
Resumo:
During a four month scholarly leave in United States of America, researchers designed a culturally appropriate prevention program for eating disorders (ED) for Brazilian adolescent girls. The program "Se Liga na Nutrição" was modeled on other effective programs identified in a research literature review and was carried out over eleven interactive sessions. It was positively received by the adolescents who suggested that it be part of school curricula. The girls reported that it helped them to develop critical thinking skills with regards to sociocultural norms about body image, food and eating practices
Resumo:
The objective of this study was to compare the impact on knowledge and counseling skills of face-to-face and Internet-based oral health training programs on medical students. Participants consisted of 148 (82 percent) of the 180 invited students attending their fifth academic year at the Faculty of Medicine, University of Sao Paulo, Brasil, in 2007. The interventions took place during a three-month training period in the clinical Center for Health Promotion, which comprised part of a clerkship in Internal Medicine. The students were divided into four groups: 1) Control Group (Control), with basic intervention; 2) Brochure Group (Br), with basic intervention plus complete brochure with oral health themes; 3) Cybertutor Group (Cy), with basic intervention plus access to an Internet-based training program about oral health themes; and 4) Cybertutor + Contact Group (Cy+C), the same as Cy plus brief proactive contact with a tutor. The impact of these interventions on student knowledge was measured with pre- and post assessments, and student skills in asking and counseling about oral health were assessed with an objective structured clinical examination (OSCE). Multivariate logistic regression models were applied to identify the odds ratios of scoring above Control's medians on the final assessment and the OSCE. In the results, Cy+C performed significantly better than Control on both the final assessment (OR 9.4; 95% CI 2.7-32.8) and the OSCE (OR 5.6; 95% CI 1.9-16.3) and outperformed all the other groups. The Cy+C group showed the most significant increase in knowledge and the best skills in asking and counseling about oral health.