961 resultados para General-purpose computing on graphics processing units (GPGPU)
Resumo:
One of the main problems of hyperspectral data analysis is the presence of mixed pixels due to the low spatial resolution of such images. Linear spectral unmixing aims at inferring pure spectral signatures and their fractions at each pixel of the scene. The huge data volumes acquired by hyperspectral sensors put stringent requirements on processing and unmixing methods. This letter proposes an efficient implementation of the method called simplex identification via split augmented Lagrangian (SISAL) which exploits the graphics processing unit (GPU) architecture at low level using Compute Unified Device Architecture. SISAL aims to identify the endmembers of a scene, i.e., is able to unmix hyperspectral data sets in which the pure pixel assumption is violated. The proposed implementation is performed in a pixel-by-pixel fashion using coalesced accesses to memory and exploiting shared memory to store temporary data. Furthermore, the kernels have been optimized to minimize the threads divergence, therefore achieving high GPU occupancy. The experimental results obtained for the simulated and real hyperspectral data sets reveal speedups up to 49 times, which demonstrates that the GPU implementation can significantly accelerate the method's execution over big data sets while maintaining the methods accuracy.
Resumo:
Parallel hyperspectral unmixing problem is considered in this paper. A semisupervised approach is developed under the linear mixture model, where the abundance's physical constraints are taken into account. The proposed approach relies on the increasing availability of spectral libraries of materials measured on the ground instead of resorting to endmember extraction methods. Since Libraries are potentially very large and hyperspectral datasets are of high dimensionality a parallel implementation in a pixel-by-pixel fashion is derived to properly exploits the graphics processing units (GPU) architecture at low level, thus taking full advantage of the computational power of GPUs. Experimental results obtained for real hyperspectral datasets reveal significant speedup factors, up to 164 times, with regards to optimized serial implementation.
Resumo:
Many Hyperspectral imagery applications require a response in real time or near-real time. To meet this requirement this paper proposes a parallel unmixing method developed for graphics processing units (GPU). This method is based on the vertex component analysis (VCA), which is a geometrical based method highly parallelizable. VCA is a very fast and accurate method that extracts endmember signatures from large hyperspectral datasets without the use of any a priori knowledge about the constituent spectra. Experimental results obtained for simulated and real hyperspectral datasets reveal considerable acceleration factors, up to 24 times.
Resumo:
Current computer systems have evolved from featuring only a single processing unit and limited RAM, in the order of kilobytes or few megabytes, to include several multicore processors, o↵ering in the order of several tens of concurrent execution contexts, and have main memory in the order of several tens to hundreds of gigabytes. This allows to keep all data of many applications in the main memory, leading to the development of inmemory databases. Compared to disk-backed databases, in-memory databases (IMDBs) are expected to provide better performance by incurring in less I/O overhead. In this dissertation, we present a scalability study of two general purpose IMDBs on multicore systems. The results show that current general purpose IMDBs do not scale on multicores, due to contention among threads running concurrent transactions. In this work, we explore di↵erent direction to overcome the scalability issues of IMDBs in multicores, while enforcing strong isolation semantics. First, we present a solution that requires no modification to either database systems or to the applications, called MacroDB. MacroDB replicates the database among several engines, using a master-slave replication scheme, where update transactions execute on the master, while read-only transactions execute on slaves. This reduces contention, allowing MacroDB to o↵er scalable performance under read-only workloads, while updateintensive workloads su↵er from performance loss, when compared to the standalone engine. Second, we delve into the database engine and identify the concurrency control mechanism used by the storage sub-component as a scalability bottleneck. We then propose a new locking scheme that allows the removal of such mechanisms from the storage sub-component. This modification o↵ers performance improvement under all workloads, when compared to the standalone engine, while scalability is limited to read-only workloads. Next we addressed the scalability limitations for update-intensive workloads, and propose the reduction of locking granularity from the table level to the attribute level. This further improved performance for intensive and moderate update workloads, at a slight cost for read-only workloads. Scalability is limited to intensive-read and read-only workloads. Finally, we investigate the impact applications have on the performance of database systems, by studying how operation order inside transactions influences the database performance. We then propose a Read before Write (RbW) interaction pattern, under which transaction perform all read operations before executing write operations. The RbW pattern allowed TPC-C to achieve scalable performance on our modified engine for all workloads. Additionally, the RbW pattern allowed our modified engine to achieve scalable performance on multicores, almost up to the total number of cores, while enforcing strong isolation.
Resumo:
In the present work the benefits of using graphics processing units (GPU) to aid the design of complex geometry profile extrusion dies, are studied. For that purpose, a3Dfinite volume based code that employs unstructured meshes to solve and couple the continuity, momentum and energy conservation equations governing the fluid flow, together with aconstitutive equation, was used. To evaluate the possibility of reducing the calculation time spent on the numerical calculations, the numerical code was parallelized in the GPU, using asimple programing approach without complex memory manipulations. For verificationpurposes, simulations were performed for three benchmark problems: Poiseuille flow, lid-driven cavity flow and flow around acylinder. Subsequently, the code was used on the design of two real life extrusion dies for the production of a medical catheter and a wood plastic composite decking profile. To evaluate the benefits, the results obtained with the GPU parallelized code were compared, in terms of speedup, with a serial implementation of the same code, that traditionally runs on the central processing unit (CPU). The results obtained show that, even with the simple parallelization approach employed, it was possible to obtain a significant reduction of the computation times.
Resumo:
The increasing volume of data describing humandisease processes and the growing complexity of understanding, managing, and sharing such data presents a huge challenge for clinicians and medical researchers. This paper presents the@neurIST system, which provides an infrastructure for biomedical research while aiding clinical care, by bringing together heterogeneous data and complex processing and computing services. Although @neurIST targets the investigation and treatment of cerebral aneurysms, the system’s architecture is generic enough that it could be adapted to the treatment of other diseases.Innovations in @neurIST include confining the patient data pertaining to aneurysms inside a single environment that offers cliniciansthe tools to analyze and interpret patient data and make use of knowledge-based guidance in planning their treatment. Medicalresearchers gain access to a critical mass of aneurysm related data due to the system’s ability to federate distributed informationsources. A semantically mediated grid infrastructure ensures that both clinicians and researchers are able to seamlessly access andwork on data that is distributed across multiple sites in a secure way in addition to providing computing resources on demand forperforming computationally intensive simulations for treatment planning and research.
Resumo:
The intake of carotenoids is associated with antioxidant properties and some of these substances have activity of pro-vitamin A. This study aimed to estimate the intake of carotenoids (average values) by the Brazilian population focusing on beneficiaries of the 'Bolsa Família' Program and identify the dietary sources, according to the purpose and degree of processing and the inclusion of food additives. The database used is the personal food consumption module of the Household Budget Survey of 2008-2009, conducted by the Brazilian Institute of Geography and Statistics. The content of carotenoids in foods was obtained primarily from a National data source. Food products were classified into three categories: 1) fresh and minimally processed foods; 2) processed foods (containing food additives, except for flavoring and coloring agents); and 3) highly processed foods (containing flavoring and coloring agents). Insufficient intakes were identified for the conditional cash transfer program beneficiaries (3,547.1 µg). Fresh and minimally processed foods supplied between 48.6% (for girls) and 65.7% (for male adults) of pro-vitamin carotenoids. Processed foods were sources of between 55.5% and 57.0% of lutein + zeaxanthin for elderly and between 58.0% and 67.8% of lycopene for adults. Highly processed foods contributed to less than 5.0% of total carotenoids.
Resumo:
This study performed a aecondvy dau analysis of information collected during the Youth Leisure Study (YLS). The purpose of this study was to examine the potential moderating influences of gender and general self-efTicacy on the relationships aoKXig sensation-seeking and various forms of substance use in adolescents. Specifically, the predictive ability of sensation seeking on five adolescents substance use outcomes (alcohol, tobacco, and marijuana use; binge drinking; and number of times drunk) was examined. Moderated hierarchical multiple regression (MHMR) analyses were used to examine the relationships among study variables. The results for this study indicate that the relationships among sensation-seeking and forms of adolescent substance use are more complex than literature suggests. Main effect relationships were found consistently for sensation-seeking and general self-efficacy with each of the outcome variables. Results for gender were not consistent across the substance use outcomes. Gender was a significant predictor for marijuana use only. The moderating effects of general self-efficacy (GSE) on the sensation-seekingsubstance use relationship were inconsistent. While no significant interactions were found for tobacco or alcohol use outcomes, GSE was found to moderate the relationship between sensation-seeking and marijuana use indicating that feelings of high general selfefficacy act as a buffer or guard against marijuana use. A consistent pattern was found among the alcohol use variables (alcohol use. binge drinking, and number of times drunk). Gender was found to moderate each of these variables indicating that higher levels of sensation seeking are more predictive of higher levels of adolescent alcohol use in males only. Implications of this study on the field of education, are discussed further, and suggestions for future research are presented.
Resumo:
Older adults represent the most sedentary segment of the adult population, and thus it is critical to investigate factors that influence exercise behaviour for this age group. The purpose of this study was to examine the influence of a general exercise program, incorporating cardiovascular, strength, flexibility, and balance components, on task selfefficacy and SPA in older adult men and women. Participants (n=114, Mage = 67 years) were recruited from the Niagara region and randomly assigned to a 12-week supervised exercise program or a wait-list control. Task self-efficacy and SPA measures were taken at baseline and program end. The present study found that task self-efficacy was a significant predictor of leisure time physical activity for older adults. In addition, change in task self-efficacy was a significant predictor of change in SPA. The findings of this study suggest that sources of task self-efficacy should be considered for exercise interventions targeting older adults.
Resumo:
La tomographie d’émission par positrons (TEP) est une modalité d’imagerie moléculaire utilisant des radiotraceurs marqués par des isotopes émetteurs de positrons permettant de quantifier et de sonder des processus biologiques et physiologiques. Cette modalité est surtout utilisée actuellement en oncologie, mais elle est aussi utilisée de plus en plus en cardiologie, en neurologie et en pharmacologie. En fait, c’est une modalité qui est intrinsèquement capable d’offrir avec une meilleure sensibilité des informations fonctionnelles sur le métabolisme cellulaire. Les limites de cette modalité sont surtout la faible résolution spatiale et le manque d’exactitude de la quantification. Par ailleurs, afin de dépasser ces limites qui constituent un obstacle pour élargir le champ des applications cliniques de la TEP, les nouveaux systèmes d’acquisition sont équipés d’un grand nombre de petits détecteurs ayant des meilleures performances de détection. La reconstruction de l’image se fait en utilisant les algorithmes stochastiques itératifs mieux adaptés aux acquisitions à faibles statistiques. De ce fait, le temps de reconstruction est devenu trop long pour une utilisation en milieu clinique. Ainsi, pour réduire ce temps, on les données d’acquisition sont compressées et des versions accélérées d’algorithmes stochastiques itératifs qui sont généralement moins exactes sont utilisées. Les performances améliorées par l’augmentation de nombre des détecteurs sont donc limitées par les contraintes de temps de calcul. Afin de sortir de cette boucle et permettre l’utilisation des algorithmes de reconstruction robustes, de nombreux travaux ont été effectués pour accélérer ces algorithmes sur les dispositifs GPU (Graphics Processing Units) de calcul haute performance. Dans ce travail, nous avons rejoint cet effort de la communauté scientifique pour développer et introduire en clinique l’utilisation des algorithmes de reconstruction puissants qui améliorent la résolution spatiale et l’exactitude de la quantification en TEP. Nous avons d’abord travaillé sur le développement des stratégies pour accélérer sur les dispositifs GPU la reconstruction des images TEP à partir des données d’acquisition en mode liste. En fait, le mode liste offre de nombreux avantages par rapport à la reconstruction à partir des sinogrammes, entre autres : il permet d’implanter facilement et avec précision la correction du mouvement et le temps de vol (TOF : Time-Of Flight) pour améliorer l’exactitude de la quantification. Il permet aussi d’utiliser les fonctions de bases spatio-temporelles pour effectuer la reconstruction 4D afin d’estimer les paramètres cinétiques des métabolismes avec exactitude. Cependant, d’une part, l’utilisation de ce mode est très limitée en clinique, et d’autre part, il est surtout utilisé pour estimer la valeur normalisée de captation SUV qui est une grandeur semi-quantitative limitant le caractère fonctionnel de la TEP. Nos contributions sont les suivantes : - Le développement d’une nouvelle stratégie visant à accélérer sur les dispositifs GPU l’algorithme 3D LM-OSEM (List Mode Ordered-Subset Expectation-Maximization), y compris le calcul de la matrice de sensibilité intégrant les facteurs d’atténuation du patient et les coefficients de normalisation des détecteurs. Le temps de calcul obtenu est non seulement compatible avec une utilisation clinique des algorithmes 3D LM-OSEM, mais il permet également d’envisager des reconstructions rapides pour les applications TEP avancées telles que les études dynamiques en temps réel et des reconstructions d’images paramétriques à partir des données d’acquisitions directement. - Le développement et l’implantation sur GPU de l’approche Multigrilles/Multitrames pour accélérer l’algorithme LMEM (List-Mode Expectation-Maximization). L’objectif est de développer une nouvelle stratégie pour accélérer l’algorithme de référence LMEM qui est un algorithme convergent et puissant, mais qui a l’inconvénient de converger très lentement. Les résultats obtenus permettent d’entrevoir des reconstructions en temps quasi-réel que ce soit pour les examens utilisant un grand nombre de données d’acquisition aussi bien que pour les acquisitions dynamiques synchronisées. Par ailleurs, en clinique, la quantification est souvent faite à partir de données d’acquisition en sinogrammes généralement compressés. Mais des travaux antérieurs ont montré que cette approche pour accélérer la reconstruction diminue l’exactitude de la quantification et dégrade la résolution spatiale. Pour cette raison, nous avons parallélisé et implémenté sur GPU l’algorithme AW-LOR-OSEM (Attenuation-Weighted Line-of-Response-OSEM) ; une version de l’algorithme 3D OSEM qui effectue la reconstruction à partir de sinogrammes sans compression de données en intégrant les corrections de l’atténuation et de la normalisation dans les matrices de sensibilité. Nous avons comparé deux approches d’implantation : dans la première, la matrice système (MS) est calculée en temps réel au cours de la reconstruction, tandis que la seconde implantation utilise une MS pré- calculée avec une meilleure exactitude. Les résultats montrent que la première implantation offre une efficacité de calcul environ deux fois meilleure que celle obtenue dans la deuxième implantation. Les temps de reconstruction rapportés sont compatibles avec une utilisation clinique de ces deux stratégies.
Resumo:
clRNG et clProbdist sont deux interfaces de programmation (APIs) que nous avons développées pour la génération de nombres aléatoires uniformes et non uniformes sur des dispositifs de calculs parallèles en utilisant l’environnement OpenCL. La première interface permet de créer au niveau d’un ordinateur central (hôte) des objets de type stream considérés comme des générateurs virtuels parallèles qui peuvent être utilisés aussi bien sur l’hôte que sur les dispositifs parallèles (unités de traitement graphique, CPU multinoyaux, etc.) pour la génération de séquences de nombres aléatoires. La seconde interface permet aussi de générer au niveau de ces unités des variables aléatoires selon différentes lois de probabilité continues et discrètes. Dans ce mémoire, nous allons rappeler des notions de base sur les générateurs de nombres aléatoires, décrire les systèmes hétérogènes ainsi que les techniques de génération parallèle de nombres aléatoires. Nous présenterons aussi les différents modèles composant l’architecture de l’environnement OpenCL et détaillerons les structures des APIs développées. Nous distinguons pour clRNG les fonctions qui permettent la création des streams, les fonctions qui génèrent les variables aléatoires uniformes ainsi que celles qui manipulent les états des streams. clProbDist contient les fonctions de génération de variables aléatoires non uniformes selon la technique d’inversion ainsi que les fonctions qui permettent de retourner différentes statistiques des lois de distribution implémentées. Nous évaluerons ces interfaces de programmation avec deux simulations qui implémentent un exemple simplifié d’un modèle d’inventaire et un exemple d’une option financière. Enfin, nous fournirons les résultats d’expérimentation sur les performances des générateurs implémentés.
Resumo:
Studies of construction labour productivity have revealed that limited predictability and multi-agent social complexity make long-range planning of construction projects extremely inaccurate. Fire-fighting, a cultural feature of construction project management, social and structural diversity of involved permanent organizations, and structural temporality all contribute towards relational failures and frequent changes. The main purpose of this paper is therefore to demonstrate that appropriate construction planning may have a profound synergistic effect on structural integration of a project organization. Using the general systems theory perspective it is further a specific objective to investigate and evaluate organizational effects of changes in planning and potentials for achieving continuous project-organizational synergy. The newly developed methodology recognises that planning should also represent a continuous, improvement-leading driving force throughout a project. The synergistic effect of the process planning membership duality fostered project-wide integration, eliminated internal boundaries, and created a pool of constantly upgrading knowledge. It maintained a creative environment that resulted in a number of process-related improvements from all parts of the organization. As a result labour productivity has seen increases of more than 30%, profits have risen from an average of 12% to more than 18%, and project durations have been reduced by several days.
Resumo:
Background: The computational grammatical complexity ( CGC) hypothesis claims that children with G(rammatical)-specific language impairment ( SLI) have a domain-specific deficit in the computational system affecting syntactic dependencies involving 'movement'. One type of such syntactic dependencies is filler-gap dependencies. In contrast, the Generalized Slowing Hypothesis claims that SLI children have a domain-general deficit affecting processing speed and capacity. Aims: To test contrasting accounts of SLI we investigate processing of syntactic (filler-gap) dependencies in wh-questions. Methods & Procedures: Fourteen 10; 2 - 17; 2 G-SLI children, 14 age- matched and 17 vocabulary-matched controls were studied using the cross- modal picturepriming paradigm. Outcomes & Results: G-SLI children's processing speed was significantly slower than the age controls, but not younger vocabulary controls. The G- SLI children and vocabulary controls did not differ on memory span. However, the typically developing and G-SLI children showed a qualitatively different processing pattern. The age and vocabulary controls showed priming at the gap, indicating that they process wh-questions through syntactic filler-gap dependencies. In contrast, G-SLI children showed priming only at the verb. Conclusions: The findings indicate that G-SLI children fail to establish reliably a syntactic filler- gap dependency and instead interpret wh-questions via lexical thematic information. These data challenge the Generalized Slowing Hypothesis account, but support the CGC hypothesis, according to which G-SLI children have a particular deficit in the computational system affecting syntactic dependencies involving 'movement'. As effective remediation often depends on aetiological insight, the discovery of the nature of the syntactic deficit, along side a possible compensatory use of semantics to facilitate sentence processing, can be used to direct therapy. However, the therapeutic strategy to be used, and whether such similar strengths and weaknesses within the language system are found in other SLI subgroups are empirical issues that warrant further research.
Resumo:
SAFT techniques are based on the sequential activation, in emission and reception, of the array elements and the post-processing of all the received signals to compose the image. Thus, the image generation can be divided into two stages: (1) the excitation and acquisition stage, where the signals received by each element or group of elements are stored; and (2) the beamforming stage, where the signals are combined together to obtain the image pixels. The use of Graphics Processing Units (GPUs), which are programmable devices with a high level of parallelism, can accelerate the computations of the beamforming process, that usually includes different functions such as dynamic focusing, band-pass filtering, spatial filtering or envelope detection. This work shows that using GPU technology can accelerate, in more than one order of magnitude with respect to CPU implementations, the beamforming and post-processing algorithms in SAFT imaging. ©2009 IEEE.
Resumo:
In general, pattern recognition techniques require a high computational burden for learning the discriminating functions that are responsible to separate samples from distinct classes. As such, there are several studies that make effort to employ machine learning algorithms in the context of big data classification problems. The research on this area ranges from Graphics Processing Units-based implementations to mathematical optimizations, being the main drawback of the former approaches to be dependent on the graphic video card. Here, we propose an architecture-independent optimization approach for the optimum-path forest (OPF) classifier, that is designed using a theoretical formulation that relates the minimum spanning tree with the minimum spanning forest generated by the OPF over the training dataset. The experiments have shown that the approach proposed can be faster than the traditional one in five public datasets, being also as accurate as the original OPF. (C) 2014 Elsevier B. V. All rights reserved.