933 resultados para parallel and distributed information processing
Resumo:
Documento submetido para revisão pelos pares. A publicar em Journal of Parallel and Distributed Computing. ISSN 0743-7315
Resumo:
La présente recherche est constituée de deux études. Dans l’étude 1, il s’agit d’améliorer la validité écologique des travaux sur la reconnaissance émotionnelle faciale (REF) en procédant à la validation de stimuli qui permettront d’étudier cette question en réalité virtuelle. L’étude 2 vise à documenter la relation entre le niveau de psychopathie et la performance à une tâche de REF au sein d’un échantillon de la population générale. Pour ce faire, nous avons créé des personnages virtuels animés de différentes origines ethniques exprimant les six émotions fondamentales à différents niveaux d’intensité. Les stimuli, sous forme statique et dynamique, ont été évalués par des étudiants universitaires. Les résultats de l’étude 1 indiquent que les stimuli virtuels, en plus de comporter plusieurs traits distinctifs, constituent un ensemble valide pour étudier la REF. L’étude 2 a permis de constater qu’un score plus élevé à l’échelle de psychopathie, spécifiquement à la facette de l’affect plat, est associé à une plus grande sensibilité aux expressions émotionnelles, particulièrement pour la tristesse. Inversement, un niveau élevé de tendances criminelles est, pour sa part, associé à une certaine insensibilité générale et à un déficit spécifique pour le dégoût. Ces résultats sont spécifiques aux participants masculins. Les données s’inscrivent dans une perspective évolutive de la psychopathie. L’étude met en évidence l’importance d’étudier l’influence respective des facettes de la personnalité psychopathique, ce même dans des populations non-cliniques. De plus, elle souligne la manifestation différentielle des tendances psychopathiques chez les hommes et chez les femmes.
Resumo:
Frequent pattern discovery in structured data is receiving an increasing attention in many application areas of sciences. However, the computational complexity and the large amount of data to be explored often make the sequential algorithms unsuitable. In this context high performance distributed computing becomes a very interesting and promising approach. In this paper we present a parallel formulation of the frequent subgraph mining problem to discover interesting patterns in molecular compounds. The application is characterized by a highly irregular tree-structured computation. No estimation is available for task workloads, which show a power-law distribution in a wide range. The proposed approach allows dynamic resource aggregation and provides fault and latency tolerance. These features make the distributed application suitable for multi-domain heterogeneous environments, such as computational Grids. The distributed application has been evaluated on the well known National Cancer Institute’s HIV-screening dataset.
Resumo:
Traditionally, applications and tools supporting collaborative computing have been designed only with personal computers in mind and support a limited range of computing and network platforms. These applications are therefore not well equipped to deal with network heterogeneity and, in particular, do not cope well with dynamic network topologies. Progress in this area must be made if we are to fulfil the needs of users and support the diversity, mobility, and portability that are likely to characterise group work in future. This paper describes a groupware platform called Coco that is designed to support collaboration in a heterogeneous network environment. The work demonstrates that progress in the p development of a generic supporting groupware is achievable, even in the context of heterogeneous and dynamic networks. The work demonstrates the progress made in the development of an underlying communications infrastructure, building on peer-to-peer concept and topologies to improve scalability and robustness.
Resumo:
Synchronous collaborative systems allow geographically distributed users to form a virtual work environment enabling cooperation between peers and enriching the human interaction. The technology facilitating this interaction has been studied for several years and various solutions can be found at present. In this paper, we discuss our experiences with one such widely adopted technology, namely the Access Grid [1]. We describe our experiences with using this technology, identify key problem areas and propose our solution to tackle these issues appropriately. Moreover, we propose the integration of Access Grid with an Application Sharing tool, developed by the authors. Our approach allows these integrated tools to utilise the enhanced features provided by our underlying dynamic transport layer.
Resumo:
Since its introduction in 1993, the Message Passing Interface (MPI) has become a de facto standard for writing High Performance Computing (HPC) applications on clusters and Massively Parallel Processors (MPPs). The recent emergence of multi-core processor systems presents a new challenge for established parallel programming paradigms, including those based on MPI. This paper presents a new Java messaging system called MPJ Express. Using this system, we exploit multiple levels of parallelism - messaging and threading - to improve application performance on multi-core processors. We refer to our approach as nested parallelism. This MPI-like Java library can support nested parallelism by using Java or Java OpenMP (JOMP) threads within an MPJ Express process. Practicality of this approach is assessed by porting to Java a massively parallel structure formation code from Cosmology called Gadget-2. We introduce nested parallelism in the Java version of the simulation code and report good speed-ups. To the best of our knowledge it is the first time this kind of hybrid parallelism is demonstrated in a high performance Java application. (C) 2009 Elsevier Inc. All rights reserved.
Resumo:
Fully connected cubic networks (FCCNs) are a class of newly proposed hierarchical interconnection networks for multicomputer systems, which enjoy the strengths of constant node degree and good expandability. The shortest path routing in FCCNs is an open problem. In this paper, we present an oblivious routing algorithm for n-level FCCN with N = 8(n) nodes, and prove that this algorithm creates a shortest path from the source to the destination. At the costs of both an O(N)-parallel-step off-line preprocessing phase and a list of size N stored at each node, the proposed algorithm is carried out at each related node in O(n) time. In some cases the proposed algorithm is superior to the one proposed by Chang and Wang in terms of the length of the routing path. This justifies the utility of our routing strategy. (C) 2006 Elsevier Inc. All rights reserved.
Resumo:
Recently major processor manufacturers have announced a dramatic shift in their paradigm to increase computing power over the coming years. Instead of focusing on faster clock speeds and more powerful single core CPUs, the trend clearly goes towards multi core systems. This will also result in a paradigm shift for the development of algorithms for computationally expensive tasks, such as data mining applications. Obviously, work on parallel algorithms is not new per se but concentrated efforts in the many application domains are still missing. Multi-core systems, but also clusters of workstations and even large-scale distributed computing infrastructures provide new opportunities and pose new challenges for the design of parallel and distributed algorithms. Since data mining and machine learning systems rely on high performance computing systems, research on the corresponding algorithms must be on the forefront of parallel algorithm research in order to keep pushing data mining and machine learning applications to be more powerful and, especially for the former, interactive. To bring together researchers and practitioners working in this exciting field, a workshop on parallel data mining was organized as part of PKDD/ECML 2006 (Berlin, Germany). The six contributions selected for the program describe various aspects of data mining and machine learning approaches featuring low to high degrees of parallelism: The first contribution focuses the classic problem of distributed association rule mining and focuses on communication efficiency to improve the state of the art. After this a parallelization technique for speeding up decision tree construction by means of thread-level parallelism for shared memory systems is presented. The next paper discusses the design of a parallel approach for dis- tributed memory systems of the frequent subgraphs mining problem. This approach is based on a hierarchical communication topology to solve issues related to multi-domain computational envi- ronments. The forth paper describes the combined use and the customization of software packages to facilitate a top down parallelism in the tuning of Support Vector Machines (SVM) and the next contribution presents an interesting idea concerning parallel training of Conditional Random Fields (CRFs) and motivates their use in labeling sequential data. The last contribution finally focuses on very efficient feature selection. It describes a parallel algorithm for feature selection from random subsets. Selecting the papers included in this volume would not have been possible without the help of an international Program Committee that has provided detailed reviews for each paper. We would like to also thank Matthew Otey who helped with publicity for the workshop.
Resumo:
The K-Means algorithm for cluster analysis is one of the most influential and popular data mining methods. Its straightforward parallel formulation is well suited for distributed memory systems with reliable interconnection networks, such as massively parallel processors and clusters of workstations. However, in large-scale geographically distributed systems the straightforward parallel algorithm can be rendered useless by a single communication failure or high latency in communication paths. The lack of scalable and fault tolerant global communication and synchronisation methods in large-scale systems has hindered the adoption of the K-Means algorithm for applications in large networked systems such as wireless sensor networks, peer-to-peer systems and mobile ad hoc networks. This work proposes a fully distributed K-Means algorithm (EpidemicK-Means) which does not require global communication and is intrinsically fault tolerant. The proposed distributed K-Means algorithm provides a clustering solution which can approximate the solution of an ideal centralised algorithm over the aggregated data as closely as desired. A comparative performance analysis is carried out against the state of the art sampling methods and shows that the proposed method overcomes the limitations of the sampling-based approaches for skewed clusters distributions. The experimental analysis confirms that the proposed algorithm is very accurate and fault tolerant under unreliable network conditions (message loss and node failures) and is suitable for asynchronous networks of very large and extreme scale.
Resumo:
Hybrid multiprocessor architectures which combine re-configurable computing and multiprocessors on a chip are being proposed to transcend the performance of standard multi-core parallel systems. Both fine-grained and coarse-grained parallel algorithm implementations are feasible in such hybrid frameworks. A compositional strategy for designing fine-grained multi-phase regular processor arrays to target hybrid architectures is presented in this paper. The method is based on deriving component designs using classical regular array techniques and composing the components into a unified global design. Effective designs with phase-changes and data routing at run-time are characteristics of these designs. In order to describe the data transfer between phases, the concept of communication domain is introduced so that the producer–consumer relationship arising from multi-phase computation can be treated in a unified way as a data routing phase. This technique is applied to derive new designs of multi-phase regular arrays with different dataflow between phases of computation.
Resumo:
The InteGrade project is a multi-university effort to build a novel grid computing middleware based on the opportunistic use of resources belonging to user workstations. The InteGrade middleware currently enables the execution of sequential, bag-of-tasks, and parallel applications that follow the BSP or the MPI programming models. This article presents the lessons learned over the last five years of the InteGrade development and describes the solutions achieved concerning the support for robust application execution. The contributions cover the related fields of application scheduling, execution management, and fault tolerance. We present our solutions, describing their implementation principles and evaluation through the analysis of several experimental results. (C) 2010 Elsevier Inc. All rights reserved.
Resumo:
This study includes the results of the analysis of areas susceptible to degradation by remote sensing in semi-arid region, which is a matter of concern and affects the whole population and the catalyst of this process occurs by the deforestation of the savanna and improper practices by the use of soil. The objective of this research is to use biophysical parameters of the MODIS / Terra and images TM/Landsat-5 to determine areas susceptible to degradation in semi-arid Paraiba. The study area is located in the central interior of Paraíba, in the sub-basin of the River Taperoá, with average annual rainfall below 400 mm and average annual temperature of 28 ° C. To draw up the map of vegetation were used TM/Landsat-5 images, specifically, the composition 5R4G3B colored, commonly used for mapping land use. This map was produced by unsupervised classification by maximum likelihood. The legend corresponds to the following targets: savanna vegetation sparse and dense, riparian vegetation and exposed soil. The biophysical parameters used in the MODIS were emissivity, albedo and vegetation index for NDVI (NDVI). The GIS computer programs used were Modis Reprojections Tools and System Information Processing Georeferenced (SPRING), which was set up and worked the bank of information from sensors MODIS and TM and ArcGIS software for making maps more customizable. Initially, we evaluated the behavior of the vegetation emissivity by adapting equation Bastiaanssen on NDVI for spatialize emissivity and observe changes during the year 2006. The albedo was used to view your percentage of increase in the periods December 2003 and 2004. The image sensor of Landsat TM were used for the month of December 2005, according to the availability of images and in periods of low emissivity. For these applications were made in language programs for GIS Algebraic Space (LEGAL), which is a routine programming SPRING, which allows you to perform various types of algebras of spatial data and maps. For the detection of areas susceptible to environmental degradation took into account the behavior of the emissivity of the savanna that showed seasonal coinciding with the rainy season, reaching a maximum emissivity in the months April to July and in the remaining months of a low emissivity . With the images of the albedo of December 2003 and 2004, it was verified the percentage increase, which allowed the generation of two distinct classes: areas with increased variation percentage of 1 to 11.6% and the percentage change in areas with less than 1 % albedo. It was then possible to generate the map of susceptibility to environmental degradation, with the intersection of the class of exposed soil with varying percentage of the albedo, resulting in classes susceptibility to environmental degradation
Resumo:
Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)
Resumo:
The multi-relational Data Mining approach has emerged as alternative to the analysis of structured data, such as relational databases. Unlike traditional algorithms, the multi-relational proposals allow mining directly multiple tables, avoiding the costly join operations. In this paper, is presented a comparative study involving the traditional Patricia Mine algorithm and its corresponding multi-relational proposed, MR-Radix in order to evaluate the performance of two approaches for mining association rules are used for relational databases. This study presents two original contributions: the proposition of an algorithm multi-relational MR-Radix, which is efficient for use in relational databases, both in terms of execution time and in relation to memory usage and the presentation of the empirical approach multirelational advantage in performance over several tables, which avoids the costly join operations from multiple tables. © 2011 IEEE.
Resumo:
Multi-relational data mining enables pattern mining from multiple tables. The existing multi-relational mining association rules algorithms are not able to process large volumes of data, because the amount of memory required exceeds the amount available. The proposed algorithm MRRadix presents a framework that promotes the optimization of memory usage. It also uses the concept of partitioning to handle large volumes of data. The original contribution of this proposal is enable a superior performance when compared to other related algorithms and moreover successfully concludes the task of mining association rules in large databases, bypass the problem of available memory. One of the tests showed that the MR-Radix presents fourteen times less memory usage than the GFP-growth. © 2011 IEEE.