35 resultados para Parallel numerical algorithms
Resumo:
Trabalho Final de Mestrado para obtenção do grau de Mestre em Engenharia Mecânica na Área de Manutenção e Produção
Resumo:
Trabalho Final de Mestrado para obtenção do grau de Mestre em Engenharia de Electrónica e Telecomunicações
Resumo:
International Conference with Peer Review 2012 IEEE International Conference in Geoscience and Remote Sensing Symposium (IGARSS), 22-27 July 2012, Munich, Germany
Resumo:
Dissertação de natureza Científica para obtenção do grau de Mestre em Engenharia Civil
Resumo:
This letter presents a new parallel method for hyperspectral unmixing composed by the efficient combination of two popular methods: vertex component analysis (VCA) and sparse unmixing by variable splitting and augmented Lagrangian (SUNSAL). First, VCA extracts the endmember signatures, and then, SUNSAL is used to estimate the abundance fractions. Both techniques are highly parallelizable, which significantly reduces the computing time. A design for the commodity graphics processing units of the two methods is presented and evaluated. Experimental results obtained for simulated and real hyperspectral data sets reveal speedups up to 100 times, which grants real-time response required by many remotely sensed hyperspectral applications.
Resumo:
Floating-point computing with more than one TFLOP of peak performance is already a reality in recent Field-Programmable Gate Arrays (FPGA). General-Purpose Graphics Processing Units (GPGPU) and recent many-core CPUs have also taken advantage of the recent technological innovations in integrated circuit (IC) design and had also dramatically improved their peak performances. In this paper, we compare the trends of these computing architectures for high-performance computing and survey these platforms in the execution of algorithms belonging to different scientific application domains. Trends in peak performance, power consumption and sustained performances, for particular applications, show that FPGAs are increasing the gap to GPUs and many-core CPUs moving them away from high-performance computing with intensive floating-point calculations. FPGAs become competitive for custom floating-point or fixed-point representations, for smaller input sizes of certain algorithms, for combinational logic problems and parallel map-reduce problems. © 2014 Technical University of Munich (TUM).
Resumo:
Real structures can be thought as an assembly of components, as for instances plates, shells and beams. This later type of component is very commonly found in structures like frames which can involve a significant degree of complexity or as a reinforcement element of plates or shells. To obtain the desired mechanical behavior of these components or to improve their operating conditions when rehabilitating structures, one of the eventual parameters to consider for that purpose, when possible, is the location of the supports. In the present work, a beam-type structure is considered, and for a set of cases concerning different number and types of supports, as well as different load cases, the authors optimize the location of the supports in order to obtain minimum values of the maximum transverse deflection. The optimization processes are carried out using genetic algorithms. The results obtained, clearly show a good performance of the approach proposed. © 2014 IEEE.
Resumo:
Data analytic applications are characterized by large data sets that are subject to a series of processing phases. Some of these phases are executed sequentially but others can be executed concurrently or in parallel on clusters, grids or clouds. The MapReduce programming model has been applied to process large data sets in cluster and cloud environments. For developing an application using MapReduce there is a need to install/configure/access specific frameworks such as Apache Hadoop or Elastic MapReduce in Amazon Cloud. It would be desirable to provide more flexibility in adjusting such configurations according to the application characteristics. Furthermore the composition of the multiple phases of a data analytic application requires the specification of all the phases and their orchestration. The original MapReduce model and environment lacks flexible support for such configuration and composition. Recognizing that scientific workflows have been successfully applied to modeling complex applications, this paper describes our experiments on implementing MapReduce as subworkflows in the AWARD framework (Autonomic Workflow Activities Reconfigurable and Dynamic). A text mining data analytic application is modeled as a complex workflow with multiple phases, where individual workflow nodes support MapReduce computations. As in typical MapReduce environments, the end user only needs to define the application algorithms for input data processing and for the map and reduce functions. In the paper we present experimental results when using the AWARD framework to execute MapReduce workflows deployed over multiple Amazon EC2 (Elastic Compute Cloud) instances.
Resumo:
An improved class of Boussinesq systems of an arbitrary order using a wave surface elevation and velocity potential formulation is derived. Dissipative effects and wave generation due to a time-dependent varying seabed are included. Thus, high-order source functions are considered. For the reduction of the system order and maintenance of some dispersive characteristics of the higher-order models, an extra O(mu 2n+2) term (n ??? N) is included in the velocity potential expansion. We introduce a nonlocal continuous/discontinuous Galerkin FEM with inner penalty terms to calculate the numerical solutions of the improved fourth-order models. The discretization of the spatial variables is made using continuous P2 Lagrange elements. A predictor-corrector scheme with an initialization given by an explicit RungeKutta method is also used for the time-variable integration. Moreover, a CFL-type condition is deduced for the linear problem with a constant bathymetry. To demonstrate the applicability of the model, we considered several test cases. Improved stability is achieved.
Resumo:
Background: Brown adipose tissue (BAT) plays an important role in whole body metabolism and could potentially mediate weight gain and insulin sensitivity. Although some imaging techniques allow BAT detection, there are currently no viable methods for continuous acquisition of BAT energy expenditure. We present a non-invasive technique for long term monitoring of BAT metabolism using microwave radiometry. Methods: A multilayer 3D computational model was created in HFSS™ with 1.5 mm skin, 3-10 mm subcutaneous fat, 200 mm muscle and a BAT region (2-6 cm3) located between fat and muscle. Based on this model, a log-spiral antenna was designed and optimized to maximize reception of thermal emissions from the target (BAT). The power absorption patterns calculated in HFSS™ were combined with simulated thermal distributions computed in COMSOL® to predict radiometric signal measured from an ultra-low-noise microwave radiometer. The power received by the antenna was characterized as a function of different levels of BAT metabolism under cold and noradrenergic stimulation. Results: The optimized frequency band was 1.5-2.2 GHz, with averaged antenna efficiency of 19%. The simulated power received by the radiometric antenna increased 2-9 mdBm (noradrenergic stimulus) and 4-15 mdBm (cold stimulus) corresponding to increased 15-fold BAT metabolism. Conclusions: Results demonstrated the ability to detect thermal radiation from small volumes (2-6 cm3) of BAT located up to 12 mm deep and to monitor small changes (0.5°C) in BAT metabolism. As such, the developed miniature radiometric antenna sensor appears suitable for non-invasive long term monitoring of BAT metabolism.
Resumo:
Dissertação para obtenção do grau de Mestre em Engenharia Mecânica na Área de Manutenção e Produção
Resumo:
Trabalho Final de Mestrado para obtenção do grau de Mestre em Engenharia de Manutenção
Resumo:
Trabalho Final de Mestrado para obtenção do grau de Mestre em Engenharia Mecânica
Resumo:
Recent integrated circuit technologies have opened the possibility to design parallel architectures with hundreds of cores on a single chip. The design space of these parallel architectures is huge with many architectural options. Exploring the design space gets even more difficult if, beyond performance and area, we also consider extra metrics like performance and area efficiency, where the designer tries to design the architecture with the best performance per chip area and the best sustainable performance. In this paper we present an algorithm-oriented approach to design a many-core architecture. Instead of doing the design space exploration of the many core architecture based on the experimental execution results of a particular benchmark of algorithms, our approach is to make a formal analysis of the algorithms considering the main architectural aspects and to determine how each particular architectural aspect is related to the performance of the architecture when running an algorithm or set of algorithms. The architectural aspects considered include the number of cores, the local memory available in each core, the communication bandwidth between the many-core architecture and the external memory and the memory hierarchy. To exemplify the approach we did a theoretical analysis of a dense matrix multiplication algorithm and determined an equation that relates the number of execution cycles with the architectural parameters. Based on this equation a many-core architecture has been designed. The results obtained indicate that a 100 mm(2) integrated circuit design of the proposed architecture, using a 65 nm technology, is able to achieve 464 GFLOPs (double precision floating-point) for a memory bandwidth of 16 GB/s. This corresponds to a performance efficiency of 71 %. Considering a 45 nm technology, a 100 mm(2) chip attains 833 GFLOPs which corresponds to 84 % of peak performance These figures are better than those obtained by previous many-core architectures, except for the area efficiency which is limited by the lower memory bandwidth considered. The results achieved are also better than those of previous state-of-the-art many-cores architectures designed specifically to achieve high performance for matrix multiplication.
Resumo:
This work reports a theoretical study aimed to identify the plasmonic resonance condition for a system formed by metallic nanoparticles embedded in an a-Si: H matrix. The study is based on a Tauc-Lorentz model for the electrical permittivity of a-Si: H and a Drude model for the metallic nanoparticles. It is calculated the The polarizability of an sphere and ellipsoidal shaped metal nanoparticles with radius of 20 nm. We also performed FDTD simulations of light propagation inside this structure reporting a comparison among the effects caused by a single nanoparticles of Aluminium, Silver and, as a comparison, an ideally perfectly conductor. The simulation results shows that is possible to obtain a plasmonic resonance in the red part of the spectrum (600-700 nm) when 20-30 nm radius Aluminium ellipsoids are embedded into a-Si: H.