863 resultados para Parallel Architectures
Resumo:
Dissertação para obtenção do Grau de Mestre em Engenharia Informática
Resumo:
Peer-reviewed
Resumo:
Most of the commercial and financial data are stored in decimal fonn. Recently, support for decimal arithmetic has received increased attention due to the growing importance in financial analysis, banking, tax calculation, currency conversion, insurance, telephone billing and accounting. Performing decimal arithmetic with systems that do not support decimal computations may give a result with representation error, conversion error, and/or rounding error. In this world of precision, such errors are no more tolerable. The errors can be eliminated and better accuracy can be achieved if decimal computations are done using Decimal Floating Point (DFP) units. But the floating-point arithmetic units in today's general-purpose microprocessors are based on the binary number system, and the decimal computations are done using binary arithmetic. Only few common decimal numbers can be exactly represented in Binary Floating Point (BF P). ln many; cases, the law requires that results generated from financial calculations performed on a computer should exactly match with manual calculations. Currently many applications involving fractional decimal data perform decimal computations either in software or with a combination of software and hardware. The performance can be dramatically improved by complete hardware DFP units and this leads to the design of processors that include DF P hardware.VLSI implementations using same modular building blocks can decrease system design and manufacturing cost. A multiplexer realization is a natural choice from the viewpoint of cost and speed.This thesis focuses on the design and synthesis of efficient decimal MAC (Multiply ACeumulate) architecture for high speed decimal processors based on IEEE Standard for Floating-point Arithmetic (IEEE 754-2008). The research goal is to design and synthesize deeimal'MAC architectures to achieve higher performance.Efficient design methods and architectures are developed for a high performance DFP MAC unit as part of this research.
Resumo:
The process of developing software that takes advantage of multiple processors is commonly referred to as parallel programming. For various reasons, this process is much harder than the sequential case. For decades, parallel programming has been a problem for a small niche only: engineers working on parallelizing mostly numerical applications in High Performance Computing. This has changed with the advent of multi-core processors in mainstream computer architectures. Parallel programming in our days becomes a problem for a much larger group of developers. The main objective of this thesis was to find ways to make parallel programming easier for them. Different aims were identified in order to reach the objective: research the state of the art of parallel programming today, improve the education of software developers about the topic, and provide programmers with powerful abstractions to make their work easier. To reach these aims, several key steps were taken. To start with, a survey was conducted among parallel programmers to find out about the state of the art. More than 250 people participated, yielding results about the parallel programming systems and languages in use, as well as about common problems with these systems. Furthermore, a study was conducted in university classes on parallel programming. It resulted in a list of frequently made mistakes that were analyzed and used to create a programmers' checklist to avoid them in the future. For programmers' education, an online resource was setup to collect experiences and knowledge in the field of parallel programming - called the Parawiki. Another key step in this direction was the creation of the Thinking Parallel weblog, where more than 50.000 readers to date have read essays on the topic. For the third aim (powerful abstractions), it was decided to concentrate on one parallel programming system: OpenMP. Its ease of use and high level of abstraction were the most important reasons for this decision. Two different research directions were pursued. The first one resulted in a parallel library called AthenaMP. It contains so-called generic components, derived from design patterns for parallel programming. These include functionality to enhance the locks provided by OpenMP, to perform operations on large amounts of data (data-parallel programming), and to enable the implementation of irregular algorithms using task pools. AthenaMP itself serves a triple role: the components are well-documented and can be used directly in programs, it enables developers to study the source code and learn from it, and it is possible for compiler writers to use it as a testing ground for their OpenMP compilers. The second research direction was targeted at changing the OpenMP specification to make the system more powerful. The main contributions here were a proposal to enable thread-cancellation and a proposal to avoid busy waiting. Both were implemented in a research compiler, shown to be useful in example applications, and proposed to the OpenMP Language Committee.
Resumo:
We propose a bridge between two important parallel programming paradigms: data parallelism and communicating sequential processes (CSP). Data parallel pipelined architectures obtained with the Alpha language can be embedded in a control intensive application expressed in CSP-based Handel formalism. The interface is formally defined from the semantics of the languages Alpha and Handel. This work will ease the design of compute intensive applications on FPGAs.
Resumo:
Both the (5,3) counter and (2,2,3) counter multiplication techniques are investigated for the efficiency of their operation speed and the viability of the architectures when implemented in a fast bipolar ECL technology. The implementation of the counters in series-gated ECL and threshold logic are contrasted for speed, noise immunity and complexity, and are critically compared with the fastest practical design of a full-adder. A novel circuit technique to overcome the problems of needing high fan-in input weights in threshold circuits through the use of negative weighted inputs is presented. The authors conclude that a (2,2,3) counter based array multiplier implemented in series-gated ECL should enable a significant increase in speed over conventional full adder based array multipliers.
Resumo:
Modular product architectures have generated numerous benefits for companies in terms of cost, lead-time and quality. The defined interfaces and the module’s properties decrease the effort to develop new product variants, and provide an opportunity to perform parallel tasks in design, manufacturing and assembly. The background of this thesis is that companies perform verifications (tests, inspections and controls) of products late, when most of the parts have been assembled. This extends the lead-time to delivery and ruins benefits from a modular product architecture; specifically when the verifications are extensive and the frequency of detected defects is high. Due to the number of product variants obtained from the modular product architecture, verifications must handle a wide range of equipment, instructions and goal values to ensure that high quality products can be delivered. As a result, the total benefits from a modular product architecture are difficult to achieve. This thesis describes a method for planning and performing verifications within a modular product architecture. The method supports companies by utilizing the defined modules for verifications already at module level, so called MPV (Module Property Verification). With MPV, defects are detected at an earlier point, compared to verification of a complete product, and the number of verifications is decreased. The MPV method is built up of three phases. In Phase A, candidate modules are evaluated on the basis of costs and lead-time of the verifications and the repair of defects. An MPV-index is obtained which quantifies the module and indicates if the module should be verified at product level or by MPV. In Phase B, the interface interaction between the modules is evaluated, as well as the distribution of properties among the modules. The purpose is to evaluate the extent to which supplementary verifications at product level is needed. Phase C supports a selection of the final verification strategy. The cost and lead-time for the supplementary verifications are considered together with the results from Phase A and B. The MPV method is based on a set of qualitative and quantitative measures and tools which provide an overview and support the achievement of cost and time efficient company specific verifications. A practical application in industry shows how the MPV method can be used, and the subsequent benefits
Resumo:
In the field of organic thin films, manipulation at the nanoscale can be obtained by immobilization of different materials on platforms designed to enhance a specific property via the layer-by-layer technique. In this paper we describe the fabrication of nanostructured films containing cobalt tetrasulfonated phthalocyanine (CoTsPc) obtained through the layer-by-layer architecture and assembled with linear poly(allylamine hydrochloride) (PAH) and poly(amidoamine) dendrimer (PAMAM) polyelectrolytes. Film growth was monitored by UV-vis spectroscopy following the Q band of CoTsPc and revealed a linear growth for both systems. Fourier transform infrared (FTIR) spectroscopy showed that the driving force keeping the structure of the films was achieved upon interactions of CoTsPc sulfonic groups with protonated amine groups present in the positive polyelectrolyte. A comprehensive SPR investigation on film growth reproduced the deposition process dynamically and provided an estimation of the thicknesses of the layers. Both FTIR and SPR techniques suggested a preferential orientation of the Pc ring parallel to the substrate. The electrical conductivity of the PAH films deposited on interdigitated electrodes was found to be very sensitive to water vapor. These results point to the development of a phthalocyanine-based humidity sensor obtained from a simple thin film deposition technique, whose ability to tailor molecular organization was crucial to achieve high sensitivity.
Resumo:
Parallel kinematic structures are considered very adequate architectures for positioning and orienti ng the tools of robotic mechanisms. However, developing dynamic models for this kind of systems is sometimes a difficult task. In fact, the direct application of traditional methods of robotics, for modelling and analysing such systems, usually does not lead to efficient and systematic algorithms. This work addre sses this issue: to present a modular approach to generate the dynamic model and through some convenient modifications, how we can make these methods more applicable to parallel structures as well. Kane’s formulati on to obtain the dynamic equations is shown to be one of the easiest ways to deal with redundant coordinates and kinematic constraints, so that a suitable c hoice of a set of coordinates allows the remaining of the modelling procedure to be computer aided. The advantages of this approach are discussed in the modelling of a 3-dof parallel asymmetric mechanisms.
Resumo:
[EN]A new parallel algorithm for simultaneous untangling and smoothing of tetrahedral meshes is proposed in this paper. We provide a detailed analysis of its performance on shared-memory many-core computer architectures. This performance analysis includes the evaluation of execution time, parallel scalability, load balancing, and parallelism bottlenecks. Additionally, we compare the impact of three previously published graph coloring procedures on the performance of our parallel algorithm. We use six benchmark meshes with a wide range of sizes. Using these experimental data sets, we describe the behavior of the parallel algorithm for different data sizes. We demonstrate that this algorithm is highly scalable when it runs on two different high-performance many-core computers with up to 128 processors...
Resumo:
The scale down of transistor technology allows microelectronics manufacturers such as Intel and IBM to build always more sophisticated systems on a single microchip. The classical interconnection solutions based on shared buses or direct connections between the modules of the chip are becoming obsolete as they struggle to sustain the increasing tight bandwidth and latency constraints that these systems demand. The most promising solution for the future chip interconnects are the Networks on Chip (NoC). NoCs are network composed by routers and channels used to inter- connect the different components installed on the single microchip. Examples of advanced processors based on NoC interconnects are the IBM Cell processor, composed by eight CPUs that is installed on the Sony Playstation III and the Intel Teraflops pro ject composed by 80 independent (simple) microprocessors. On chip integration is becoming popular not only in the Chip Multi Processor (CMP) research area but also in the wider and more heterogeneous world of Systems on Chip (SoC). SoC comprehend all the electronic devices that surround us such as cell-phones, smart-phones, house embedded systems, automotive systems, set-top boxes etc... SoC manufacturers such as ST Microelectronics , Samsung, Philips and also Universities such as Bologna University, M.I.T., Berkeley and more are all proposing proprietary frameworks based on NoC interconnects. These frameworks help engineers in the switch of design methodology and speed up the development of new NoC-based systems on chip. In this Thesis we propose an introduction of CMP and SoC interconnection networks. Then focusing on SoC systems we propose: • a detailed analysis based on simulation of the Spidergon NoC, a ST Microelectronics solution for SoC interconnects. The Spidergon NoC differs from many classical solutions inherited from the parallel computing world. Here we propose a detailed analysis of this NoC topology and routing algorithms. Furthermore we propose aEqualized a new routing algorithm designed to optimize the use of the resources of the network while also increasing its performance; • a methodology flow based on modified publicly available tools that combined can be used to design, model and analyze any kind of System on Chip; • a detailed analysis of a ST Microelectronics-proprietary transport-level protocol that the author of this Thesis helped developing; • a simulation-based comprehensive comparison of different network interface designs proposed by the author and the researchers at AST lab, in order to integrate shared-memory and message-passing based components on a single System on Chip; • a powerful and flexible solution to address the time closure exception issue in the design of synchronous Networks on Chip. Our solution is based on relay stations repeaters and allows to reduce the power and area demands of NoC interconnects while also reducing its buffer needs; • a solution to simplify the design of the NoC by also increasing their performance and reducing their power and area consumption. We propose to replace complex and slow virtual channel-based routers with multiple and flexible small Multi Plane ones. This solution allows us to reduce the area and power dissipation of any NoC while also increasing its performance especially when the resources are reduced. This Thesis has been written in collaboration with the Advanced System Technology laboratory in Grenoble France, and the Computer Science Department at Columbia University in the city of New York.
Resumo:
Questa dissertazione esamina le sfide e i limiti che gli algoritmi di analisi di grafi incontrano in architetture distribuite costituite da personal computer. In particolare, analizza il comportamento dell'algoritmo del PageRank così come implementato in una popolare libreria C++ di analisi di grafi distribuiti, la Parallel Boost Graph Library (Parallel BGL). I risultati qui presentati mostrano che il modello di programmazione parallela Bulk Synchronous Parallel è inadatto all'implementazione efficiente del PageRank su cluster costituiti da personal computer. L'implementazione analizzata ha infatti evidenziato una scalabilità negativa, il tempo di esecuzione dell'algoritmo aumenta linearmente in funzione del numero di processori. Questi risultati sono stati ottenuti lanciando l'algoritmo del PageRank della Parallel BGL su un cluster di 43 PC dual-core con 2GB di RAM l'uno, usando diversi grafi scelti in modo da facilitare l'identificazione delle variabili che influenzano la scalabilità. Grafi rappresentanti modelli diversi hanno dato risultati differenti, mostrando che c'è una relazione tra il coefficiente di clustering e l'inclinazione della retta che rappresenta il tempo in funzione del numero di processori. Ad esempio, i grafi Erdős–Rényi, aventi un basso coefficiente di clustering, hanno rappresentato il caso peggiore nei test del PageRank, mentre i grafi Small-World, aventi un alto coefficiente di clustering, hanno rappresentato il caso migliore. Anche le dimensioni del grafo hanno mostrato un'influenza sul tempo di esecuzione particolarmente interessante. Infatti, si è mostrato che la relazione tra il numero di nodi e il numero di archi determina il tempo totale.
Resumo:
The research has included the efforts in designing, assembling and structurally and functionally characterizing supramolecular biofunctional architectures for optical biosensing applications. In the first part of the study, a class of interfaces based on the biotin-NeutrAvidin binding matrix for the quantitative control of enzyme surface coverage and activity was developed. Genetically modified ß-lactamase was chosen as a model enzyme and attached to five different types of NeutrAvidin-functionalized chip surfaces through a biotinylated spacer. All matrices are suitable for achieving a controlled enzyme surface density. Data obtained by SPR are in excellent agreement with those derived from optical waveguide measurements. Among the various protein-binding strategies investigated in this study, it was found that stiffness and order between alkanethiol-based SAMs and PEGylated surfaces are very important. Matrix D based on a Nb2O5 coating showed a satisfactory regeneration possibility. The surface-immobilized enzymes were found to be stable and sufficiently active enough for a catalytic activity assay. Many factors, such as the steric crowding effect of surface-attached enzymes, the electrostatic interaction between the negatively charged substrate (Nitrocefin) and the polycationic PLL-g-PEG/PEG-Biotin polymer, mass transport effect, and enzyme orientation, are shown to influence the kinetic parameters of catalytic analysis. Furthermore, a home-built Surface Plasmon Resonance Spectrometer of SPR and a commercial miniature Fiber Optic Absorbance Spectrometer (FOAS), served as a combination set-up for affinity and catalytic biosensor, respectively. The parallel measurements offer the opportunity of on-line activity detection of surface attached enzymes. The immobilized enzyme does not have to be in contact with the catalytic biosensor. The SPR chip can easily be cleaned and used for recycling. Additionally, with regard to the application of FOAS, the integrated SPR technique allows for the quantitative control of the surface density of the enzyme, which is highly relevant for the enzymatic activity. Finally, the miniaturized portable FOAS devices can easily be combined as an add-on device with many other in situ interfacial detection techniques, such as optical waveguide lightmode spectroscopy (OWLS), the quartz crystal microbalance (QCM) measurements, or impedance spectroscopy (IS). Surface plasmon field-enhanced fluorescence spectroscopy (SPFS) allows for an absolute determination of intrinsic rate constants describing the true parameters that control interfacial hybridization. Thus it also allows for a study of the difference of the surface coupling influences between OMCVD gold particles and planar metal films presented in the second part. The multilayer growth process was found to proceed similarly to the way it occurs on planar metal substrates. In contrast to planar bulk metal surfaces, metal colloids exhibit a narrow UV-vis absorption band. This absorption band is observed if the incident photon frequency is resonant with the collective oscillation of the conduction electrons and is known as the localized surface plasmon resonance (LSPR). LSPR excitation results in extremely large molar extinction coefficients, which are due to a combination of both absorption and scattering. When considering metal-enhanced fluorescence we expect the absorption to cause quenching and the scattering to cause enhancement. Our further study will focus on the developing of a detection platform with larger gold particles, which will display a dominant scattering component and enhance the fluorescence signal. Furthermore, the results of sequence-specific detection of DNA hybridization based on OMCVD gold particles provide an excellent application potential for this kind of cheap, simple, and mild preparation protocol applied in this gold fabrication method. In the final chapter, SPFS was used for the in-depth characterizations of the conformational changes of commercial carboxymethyl dextran (CMD) substrate induced by pH and ionic strength variations were studied using surface plasmon resonance spectroscopy. The pH response of CMD is due to the changes in the electrostatics of the system between its protonated and deprotonated forms, while the ionic strength response is attributed from the charge screening effect of the cations that shield the charge of the carboxyl groups and prevent an efficient electrostatic repulsion. Additional studies were performed using SPFS with the aim of fluorophore labeling the carboxymethyl groups. CMD matrices showed typical pH and ionic strength responses, such as high pH and low ionic strength swelling. Furthermore, the effects of the surface charge and the crosslink density of the CMD matrix on the extent of stimuli responses were investigated. The swelling/collapse ratio decreased with decreasing surface concentration of the carboxyl groups and increasing crosslink density. The study of the CMD responses to external and internal variables will provide valuable background information for practical applications.
Resumo:
The 3-UPU three degrees of freedom fully parallel manipulator, where U and P are for universal and prismatic pair respectively, is a very well known manipulator that can provide the platform with three degrees of freedom of pure translation, pure rotation or mixed translation and rotation with respect to the base, according to the relative directions of the revolute pair axes (each universal pair comprises two revolute pairs with intersecting and perpendicular axes). In particular, pure translational parallel 3-UPU manipulators (3-UPU TPMs) received great attention. Many studies have been reported in the literature on singularities, workspace, and joint clearance influence on the platform accuracy of this manipulator. However, much work has still to be done to reveal all the features this topology can offer to the designer when different architecture, i.e. different geometry are considered. Therefore, this dissertation will focus on this type of the 3-UPU manipulators. The first part of the dissertation presents six new architectures of the 3-UPU TPMs which offer interesting features to the designer. In the second part, a procedure is presented which is based on some indexes, in order to allows the designer to select the best architecture of the 3-UPU TPMs for a given task. Four indexes are proposed as stiffness, clearance, singularity and size of the manipulator in order to apply the procedure.
Resumo:
Parallel mechanisms show desirable characteristics such as a large payload to robot weight ratio, considerable stiffness, low inertia and high dynamic performances. In particular, parallel manipulators with fewer than six degrees of freedom have recently attracted researchers’ attention, as their employ may prove valuable in those applications in which a higher mobility is uncalled-for. The attention of this dissertation is focused on translational parallel manipulators (TPMs), that is on parallel manipulators whose output link (platform) is provided with a pure translational motion with respect to the frame. The first part deals with the general problem of the topological synthesis and classification of TPMs, that is it identifies the architectures that TPM legs must possess for the platform to be able to freely translate in space without altering its orientation. The second part studies both constraint and direct singularities of TPMs. In particular, special families of fully-isotropic mechanisms are identified. Such manipulators exhibit outstanding properties, as they are free from singularities and show a constant orthogonal Jacobian matrix throughout their workspace. As a consequence, both the direct and the inverse position problems are linear and the kinematic analysis proves straightforward.