938 resultados para PDE-based parallel preconditioner
Resumo:
The astonishing development of diverse and different hardware platforms is twofold: on one side, the challenge for the exascale performance for big data processing and management; on the other side, the mobile and embedded devices for data collection and human machine interaction. This drove to a highly hierarchical evolution of programming models. GVirtuS is the general virtualization system developed in 2009 and firstly introduced in 2010 enabling a completely transparent layer among GPUs and VMs. This paper shows the latest achievements and developments of GVirtuS, now supporting CUDA 6.5, memory management and scheduling. Thanks to the new and improved remoting capabilities, GVirtus now enables GPU sharing among physical and virtual machines based on x86 and ARM CPUs on local workstations,computing clusters and distributed cloud appliances.
Resumo:
We advocate the Loop-of-stencil-reduce pattern as a means of simplifying the implementation of data-parallel programs on heterogeneous multi-core platforms. Loop-of-stencil-reduce is general enough to subsume map, reduce, map-reduce, stencil, stencil-reduce, and, crucially, their usage in a loop in both data-parallel and streaming applications, or a combination of both. The pattern makes it possible to deploy a single stencil computation kernel on different GPUs. We discuss the implementation of Loop-of-stencil-reduce in FastFlow, a framework for the implementation of applications based on the parallel patterns. Experiments are presented to illustrate the use of Loop-of-stencil-reduce in developing data-parallel kernels running on heterogeneous systems.
Resumo:
The evolution of wireless communication systems leads to Dynamic Spectrum Allocation for Cognitive Radio, which requires reliable spectrum sensing techniques. Among the spectrum sensing methods proposed in the literature, those that exploit cyclostationary characteristics of radio signals are particularly suitable for communication environments with low signal-to-noise ratios, or with non-stationary noise. However, such methods have high computational complexity that directly raises the power consumption of devices which often have very stringent low-power requirements. We propose a strategy for cyclostationary spectrum sensing with reduced energy consumption. This strategy is based on the principle that p processors working at slower frequencies consume less power than a single processor for the same execution time. We devise a strict relation between the energy savings and common parallel system metrics. The results of simulations show that our strategy promises very significant savings in actual devices.
Resumo:
Inverse heat conduction problems (IHCPs) appear in many important scientific and technological fields. Hence analysis, design, implementation and testing of inverse algorithms are also of great scientific and technological interest. The numerical simulation of 2-D and –D inverse (or even direct) problems involves a considerable amount of computation. Therefore, the investigation and exploitation of parallel properties of such algorithms are equally becoming very important. Domain decomposition (DD) methods are widely used to solve large scale engineering problems and to exploit their inherent ability for the solution of such problems.
Resumo:
Hyperspectral instruments have been incorporated in satellite missions, providing data of high spectral resolution of the Earth. This data can be used in remote sensing applications, such as, target detection, hazard prevention, and monitoring oil spills, among others. In most of these applications, one of the requirements of paramount importance is the ability to give real-time or near real-time response. Recently, onboard processing systems have emerged, in order to overcome the huge amount of data to transfer from the satellite to the ground station, and thus, avoiding delays between hyperspectral image acquisition and its interpretation. For this purpose, compact reconfigurable hardware modules, such as field programmable gate arrays (FPGAs) are widely used. This paper proposes a parallel FPGA-based architecture for endmember’s signature extraction. This method based on the Vertex Component Analysis (VCA) has several advantages, namely it is unsupervised, fully automatic, and it works without dimensionality reduction (DR) pre-processing step. The architecture has been designed for a low cost Xilinx Zynq board with a Zynq-7020 SoC FPGA based on the Artix-7 FPGA programmable logic and tested using real hyperspectral data sets collected by the NASA’s Airborne Visible Infra-Red Imaging Spectrometer (AVIRIS) over the Cuprite mining district in Nevada. Experimental results indicate that the proposed implementation can achieve real-time processing, while maintaining the methods accuracy, which indicate the potential of the proposed platform to implement high-performance, low cost embedded systems, opening new perspectives for onboard hyperspectral image processing.
Resumo:
A large class of computational problems are characterised by frequent synchronisation, and computational requirements which change as a function of time. When such a problem is solved on a message passing multiprocessor machine [5], the combination of these characteristics leads to system performance which deteriorate in time. As the communication performance of parallel hardware steadily improves so load balance becomes a dominant factor in obtaining high parallel efficiency. Performance can be improved with periodic redistribution of computational load; however, redistribution can sometimes be very costly. We study the issue of deciding when to invoke a global load re-balancing mechanism. Such a decision policy must actively weigh the costs of remapping against the performance benefits, and should be general enough to apply automatically to a wide range of computations. This paper discusses a generic strategy for Dynamic Load Balancing (DLB) in unstructured mesh computational mechanics applications. The strategy is intended to handle varying levels of load changes throughout the run. The major issues involved in a generic dynamic load balancing scheme will be investigated together with techniques to automate the implementation of a dynamic load balancing mechanism within the Computer Aided Parallelisation Tools (CAPTools) environment, which is a semi-automatic tool for parallelisation of mesh based FORTRAN codes.
Resumo:
In many areas of simulation, a crucial component for efficient numerical computations is the use of solution-driven adaptive features: locally adapted meshing or re-meshing; dynamically changing computational tasks. The full advantages of high performance computing (HPC) technology will thus only be able to be exploited when efficient parallel adaptive solvers can be realised. The resulting requirement for HPC software is for dynamic load balancing, which for many mesh-based applications means dynamic mesh re-partitioning. The DRAMA project has been initiated to address this issue, with a particular focus being the requirements of industrial Finite Element codes, but codes using Finite Volume formulations will also be able to make use of the project results.
Resumo:
This work introduces a tessellation-based model for the declivity analysis of geographic regions. The analysis of the relief declivity, which is embedded in the rules of the model, categorizes each tessellation cell, with respect to the whole considered region, according to the (positive, negative, null) sign of the declivity of the cell. Such information is represented in the states assumed by the cells of the model. The overall configuration of such cells allows the division of the region into subregions of cells belonging to a same category, that is, presenting the same declivity sign. In order to control the errors coming from the discretization of the region into tessellation cells, or resulting from numerical computations, interval techniques are used. The implementation of the model is naturally parallel since the analysis is performed on the basis of local rules. An immediate application is in geophysics, where an adequate subdivision of geographic areas into segments presenting similar topographic characteristics is often convenient.
Resumo:
The central product of the DRAMA (Dynamic Re-Allocation of Meshes for parallel Finite Element Applications) project is a library comprising a variety of tools for dynamic re-partitioning of unstructured Finite Element (FE) applications. The input to the DRAMA library is the computational mesh, and corresponding costs, partitioned into sub-domains. The core library functions then perform a parallel computation of a mesh re-allocation that will re-balance the costs based on the DRAMA cost model. We discuss the basic features of this cost model, which allows a general approach to load identification, modelling and imbalance minimisation. Results from crash simulations are presented which show the necessity for multi-phase/multi-constraint partitioning components.
Resumo:
In today’s big data world, data is being produced in massive volumes, at great velocity and from a variety of different sources such as mobile devices, sensors, a plethora of small devices hooked to the internet (Internet of Things), social networks, communication networks and many others. Interactive querying and large-scale analytics are being increasingly used to derive value out of this big data. A large portion of this data is being stored and processed in the Cloud due the several advantages provided by the Cloud such as scalability, elasticity, availability, low cost of ownership and the overall economies of scale. There is thus, a growing need for large-scale cloud-based data management systems that can support real-time ingest, storage and processing of large volumes of heterogeneous data. However, in the pay-as-you-go Cloud environment, the cost of analytics can grow linearly with the time and resources required. Reducing the cost of data analytics in the Cloud thus remains a primary challenge. In my dissertation research, I have focused on building efficient and cost-effective cloud-based data management systems for different application domains that are predominant in cloud computing environments. In the first part of my dissertation, I address the problem of reducing the cost of transactional workloads on relational databases to support database-as-a-service in the Cloud. The primary challenges in supporting such workloads include choosing how to partition the data across a large number of machines, minimizing the number of distributed transactions, providing high data availability, and tolerating failures gracefully. I have designed, built and evaluated SWORD, an end-to-end scalable online transaction processing system, that utilizes workload-aware data placement and replication to minimize the number of distributed transactions that incorporates a suite of novel techniques to significantly reduce the overheads incurred both during the initial placement of data, and during query execution at runtime. In the second part of my dissertation, I focus on sampling-based progressive analytics as a means to reduce the cost of data analytics in the relational domain. Sampling has been traditionally used by data scientists to get progressive answers to complex analytical tasks over large volumes of data. Typically, this involves manually extracting samples of increasing data size (progressive samples) for exploratory querying. This provides the data scientists with user control, repeatable semantics, and result provenance. However, such solutions result in tedious workflows that preclude the reuse of work across samples. On the other hand, existing approximate query processing systems report early results, but do not offer the above benefits for complex ad-hoc queries. I propose a new progressive data-parallel computation framework, NOW!, that provides support for progressive analytics over big data. In particular, NOW! enables progressive relational (SQL) query support in the Cloud using unique progress semantics that allow efficient and deterministic query processing over samples providing meaningful early results and provenance to data scientists. NOW! enables the provision of early results using significantly fewer resources thereby enabling a substantial reduction in the cost incurred during such analytics. Finally, I propose NSCALE, a system for efficient and cost-effective complex analytics on large-scale graph-structured data in the Cloud. The system is based on the key observation that a wide range of complex analysis tasks over graph data require processing and reasoning about a large number of multi-hop neighborhoods or subgraphs in the graph; examples include ego network analysis, motif counting in biological networks, finding social circles in social networks, personalized recommendations, link prediction, etc. These tasks are not well served by existing vertex-centric graph processing frameworks whose computation and execution models limit the user program to directly access the state of a single vertex, resulting in high execution overheads. Further, the lack of support for extracting the relevant portions of the graph that are of interest to an analysis task and loading it onto distributed memory leads to poor scalability. NSCALE allows users to write programs at the level of neighborhoods or subgraphs rather than at the level of vertices, and to declaratively specify the subgraphs of interest. It enables the efficient distributed execution of these neighborhood-centric complex analysis tasks over largescale graphs, while minimizing resource consumption and communication cost, thereby substantially reducing the overall cost of graph data analytics in the Cloud. The results of our extensive experimental evaluation of these prototypes with several real-world data sets and applications validate the effectiveness of our techniques which provide orders-of-magnitude reductions in the overheads of distributed data querying and analysis in the Cloud.
Resumo:
Importance: critical illness results in disability and reduced health-related quality of life (HRQOL), but the optimum timing and components of rehabilitation are uncertain. Objective: to evaluate the effect of increasing physical and nutritional rehabilitation plus information delivered during the post–intensive care unit (ICU) acute hospital stay by dedicated rehabilitation assistants on subsequent mobility, HRQOL, and prevalent disabilities. Design, Setting, and Participants: a parallel group, randomized clinical trial with blinded outcome assessment at 2 hospitals in Edinburgh, Scotland, of 240 patients discharged from the ICU between December 1, 2010, and January 31, 2013, who required at least 48 hours of mechanical ventilation. Analysis for the primary outcome and other 3-month outcomes was performed between June and August 2013; for the 6- and 12-month outcomes and the health economic evaluation, between March and April 2014. Interventions: during the post-ICU hospital stay, both groups received physiotherapy and dietetic, occupational, and speech/language therapy, but patients in the intervention group received rehabilitation that typically increased the frequency of mobility and exercise therapies 2- to 3-fold, increased dietetic assessment and treatment, used individualized goal setting, and provided greater illness-specific information. Intervention group therapy was coordinated and delivered by a dedicated rehabilitation practitioner. Main Outcomes and Measures: the Rivermead Mobility Index (RMI) (range 0-15) at 3 months; higher scores indicate greater mobility. Secondary outcomes included HRQOL, psychological outcomes, self-reported symptoms, patient experience, and cost-effectiveness during a 12-month follow-up (completed in February 2014). Results: median RMI at randomization was 3 (interquartile range [IQR], 1-6) and at 3 months was 13 (IQR, 10-14) for the intervention and usual care groups (mean difference, −0.2 [95% CI, −1.3 to 0.9; P = .71]). The HRQOL scores were unchanged by the intervention (mean difference in the Physical Component Summary score, −0.1 [95% CI, −3.3 to 3.1; P = .96]; and in the Mental Component Summary score, 0.2 [95% CI, −3.4 to 3.8; P = .91]). No differences were found for self-reported symptoms of fatigue, pain, appetite, joint stiffness, or breathlessness. Levels of anxiety, depression, and posttraumatic stress were similar, as were hand grip strength and the timed Up & Go test. No differences were found at the 6- or 12-month follow-up for any outcome measures. However, patients in the intervention group reported greater satisfaction with physiotherapy, nutritional support, coordination of care, and information provision. Conclusions and Relevance: post-ICU hospital-based rehabilitation, including increased physical and nutritional therapy plus information provision, did not improve physical recovery or HRQOL, but improved patient satisfaction with many aspects of recovery.
Resumo:
The evolution of wireless communication systems leads to Dynamic Spectrum Allocation for Cognitive Radio, which requires reliable spectrum sensing techniques. Among the spectrum sensing methods proposed in the literature, those that exploit cyclostationary characteristics of radio signals are particularly suitable for communication environments with low signal-to-noise ratios, or with non-stationary noise. However, such methods have high computational complexity that directly raises the power consumption of devices which often have very stringent low-power requirements. We propose a strategy for cyclostationary spectrum sensing with reduced energy consumption. This strategy is based on the principle that p processors working at slower frequencies consume less power than a single processor for the same execution time. We devise a strict relation between the energy savings and common parallel system metrics. The results of simulations show that our strategy promises very significant savings in actual devices.
Resumo:
Vertebrate genomes are organised into a variety of nuclear environments and chromatin states that have profound effects on the regulation of gene transcription. This variation presents a major challenge to the expression of transgenes for experimental research, genetic therapies and the production of biopharmaceuticals. The majority of transgenes succumb to transcriptional silencing by their chromosomal environment when they are randomly integrated into the genome, a phenomenon known as chromosomal position effect (CPE). It is not always feasible to target transgene integration to transcriptionally permissive “safe harbour” loci that favour transgene expression, so there remains an unmet need to identify gene regulatory elements that can be added to transgenes which protect them against CPE. Dominant regulatory elements (DREs) with chromatin barrier (or boundary) activity have been shown to protect transgenes from CPE. The HS4 element from the chicken beta-globin locus and the A2UCOE element from a human housekeeping gene locus have been shown to function as DRE barriers in a wide variety of cell types and species. Despite rapid advances in the profiling of transcription factor binding, chromatin states and chromosomal looping interactions, progress towards functionally validating the many candidate barrier elements in vertebrates has been very slow. This is largely due to the lack of a tractable and efficient assay for chromatin barrier activity. In this study, I have developed the RGBarrier assay system to test the chromatin barrier activity of candidate DREs at pre-defined isogenic loci in human cells. The RGBarrier assay consists in a Flp-based RMCE reaction for the integration of an expression construct, carrying candidate DREs, in a pre-characterised chromosomal location. The RGBarrier system involves the tracking of red, green and blue fluorescent proteins by flow cytometry to monitor on-target versus off-target integration and transgene expression. The analysis of the reporter (GFP) expression for several weeks gives a measure of the protective ability of each candidate elements from chromosomal silencing. This assay can be scaled up to test tens of new putative barrier elements in the same chromosomal context in parallel. The defined chromosomal contexts of the RGBarrier assays will allow for detailed mechanistic studies of chromosomal silencing and DRE barrier element action. Understanding these mechanisms will be of paramount importance for the design of specific solutions for overcoming chromosomal silencing in specific transgenic applications.
Resumo:
The research described in this thesis was motivated by the need of a robust model capable of representing 3D data obtained with 3D sensors, which are inherently noisy. In addition, time constraints have to be considered as these sensors are capable of providing a 3D data stream in real time. This thesis proposed the use of Self-Organizing Maps (SOMs) as a 3D representation model. In particular, we proposed the use of the Growing Neural Gas (GNG) network, which has been successfully used for clustering, pattern recognition and topology representation of multi-dimensional data. Until now, Self-Organizing Maps have been primarily computed offline and their application in 3D data has mainly focused on free noise models, without considering time constraints. It is proposed a hardware implementation leveraging the computing power of modern GPUs, which takes advantage of a new paradigm coined as General-Purpose Computing on Graphics Processing Units (GPGPU). The proposed methods were applied to different problem and applications in the area of computer vision such as the recognition and localization of objects, visual surveillance or 3D reconstruction.
Resumo:
Tese (doutorado)—Universidade de Brasília, Faculdade de Tecnologia, Departamento de Engenharia Civil e Ambiental, 2016.