881 resultados para distributed computing
Resumo:
Epidemic protocols are a bio-inspired communication and computation paradigm for extreme-scale network system based on randomized communication. The protocols rely on a membership service to build decentralized and random overlay topologies. In a weakly connected overlay topology, a naive mechanism of membership protocols can break the connectivity, thus impairing the accuracy of the application. This work investigates the factors in membership protocols that cause the loss of global connectivity and introduces the first topology connectivity recovery mechanism. The mechanism is integrated into the Expander Membership Protocol, which is then evaluated against other membership protocols. The analysis shows that the proposed connectivity recovery mechanism is effective in preserving topology connectivity and also helps to improve the application performance in terms of convergence speed.
Resumo:
The InteGrade project is a multi-university effort to build a novel grid computing middleware based on the opportunistic use of resources belonging to user workstations. The InteGrade middleware currently enables the execution of sequential, bag-of-tasks, and parallel applications that follow the BSP or the MPI programming models. This article presents the lessons learned over the last five years of the InteGrade development and describes the solutions achieved concerning the support for robust application execution. The contributions cover the related fields of application scheduling, execution management, and fault tolerance. We present our solutions, describing their implementation principles and evaluation through the analysis of several experimental results. (C) 2010 Elsevier Inc. All rights reserved.
Resumo:
Very large scale computations are now becoming routinely used as a methodology to undertake scientific research. In this context, `provenance systems' are regarded as the equivalent of the scientist's logbook for in silico experimentation: provenance captures the documentation of the process that led to some result. Using a protein compressibility analysis application, we derive a set of generic use cases for a provenance system. In order to support these, we address the following fundamental questions: what is provenance? how to record it? what is the performance impact for grid execution? what is the performance of reasoning? In doing so, we define a technologyindependent notion of provenance that captures interactions between components, internal component information and grouping of interactions, so as to allow us to analyse and reason about the execution of scientific processes. In order to support persistent provenance in heterogeneous applications, we introduce a separate provenance store, in which provenance documentation can be stored, archived and queried independently of the technology used to run the application. Through a series of practical tests, we evaluate the performance impact of such a provenance system. In summary, we demonstrate that provenance recording overhead of our prototype system remains under 10% of execution time, and we show that the recorded information successfully supports our use cases in a performant manner.
Resumo:
HydroShare is an online, collaborative system being developed for open sharing of hydrologic data and models. The goal of HydroShare is to enable scientists to easily discover and access hydrologic data and models, retrieve them to their desktop or perform analyses in a distributed computing environment that may include grid, cloud or high performance computing model instances as necessary. Scientists may also publish outcomes (data, results or models) into HydroShare, using the system as a collaboration platform for sharing data, models and analyses. HydroShare is expanding the data sharing capability of the CUAHSI Hydrologic Information System by broadening the classes of data accommodated, creating new capability to share models and model components, and taking advantage of emerging social media functionality to enhance information about and collaboration around hydrologic data and models. One of the fundamental concepts in HydroShare is that of a Resource. All content is represented using a Resource Data Model that separates system and science metadata and has elements common to all resources as well as elements specific to the types of resources HydroShare will support. These will include different data types used in the hydrology community and models and workflows that require metadata on execution functionality. The HydroShare web interface and social media functions are being developed using the Drupal content management system. A geospatial visualization and analysis component enables searching, visualizing, and analyzing geographic datasets. The integrated Rule-Oriented Data System (iRODS) is being used to manage federated data content and perform rule-based background actions on data and model resources, including parsing to generate metadata catalog information and the execution of models and workflows. This presentation will introduce the HydroShare functionality developed to date, describe key elements of the Resource Data Model and outline the roadmap for future development.
Resumo:
This paper aims at describing an educational system for teaching and learning robotic systems. Multimedia resources were used to construct a virtual laboratory where users are able to use functionalities of a virtual robotic arm, by moving and clicking the mouse without caring about the detailed internal robot operation. Moreover through the multimedia system the user can interact with a real robot arm. The engineering students are the target public of the developed system. With its contents and interactive capabilities, it has been used as a support to the traditional face-to-face classes on the subject of robotics.. In the paper it is first introduced the metaphor of Virtual Laboratory used in the system. Next, it is described the Graphical and Multimedia Environment approach: an interactive graphic user interface with a 3D environment for simulation. Design and implementation issues of the real-time interactive multimedia learning system, which supports the W3C SMIL standard for presenting the real-time multimedia teaching material, are described. Finally, some preliminary conclusions and possible future works from this research are presented.
Resumo:
Although cluster environments have an enormous potential processing power, real applications that take advantage of this power remain an elusive goal. This is due, in part, to the lack of understanding about the characteristics of the applications best suited for these environments. This paper focuses on Master/Slave applications for large heterogeneous clusters. It defines application, cluster and execution models to derive an analytic expression for the execution time. It defines speedup and derives speedup bounds based on the inherent parallelism of the application and the aggregated computing power of the cluster. The paper derives an analytical expression for efficiency and uses it to define scalability of the algorithm-cluster combination based on the isoefficiency metric. Furthermore, the paper establishes necessary and sufficient conditions for an algorithm-cluster combination to be scalable which are easy to verify and use in practice. Finally, it covers the impact of network contention as the number of processors grow. (C) 2007 Elsevier B.V. All rights reserved.
Resumo:
Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)
Resumo:
A significant set of information stored in different databases around the world, can be shared through peer-topeer databases. With that, is obtained a large base of knowledge, without the need for large investments because they are used existing databases, as well as the infrastructure in place. However, the structural characteristics of peer-topeer, makes complex the process of finding such information. On the other side, these databases are often heterogeneous in their schemas, but semantically similar in their content. A good peer-to-peer databases systems should allow the user access information from databases scattered across the network and receive only the information really relate to your topic of interest. This paper proposes to use ontologies in peer-to-peer database queries to represent the semantics inherent to the data. The main contribution of this work is enable integration between heterogeneous databases, improve the performance of such queries and use the algorithm of optimization Ant Colony to solve the problem of locating information on peer-to-peer networks, which presents an improve of 18% in results. © 2011 IEEE.
Resumo:
In a peer-to-peer network, the nodes interact with each other by sharing resources, services and information. Many applications have been developed using such networks, being a class of such applications are peer-to-peer databases. The peer-to-peer databases systems allow the sharing of unstructured data, being able to integrate data from several sources, without the need of large investments, because they are used existing repositories. However, the high flexibility and dynamicity of networks the network, as well as the absence of a centralized management of information, becomes complex the process of locating information among various participants in the network. In this context, this paper presents original contributions by a proposed architecture for a routing system that uses the Ant Colony algorithm to optimize the search for desired information supported by ontologies to add semantics to shared data, enabling integration among heterogeneous databases and the while seeking to reduce the message traffic on the network without causing losses in the amount of responses, confirmed by the improve of 22.5% in this amount. © 2011 IEEE.
Resumo:
The multi-relational Data Mining approach has emerged as alternative to the analysis of structured data, such as relational databases. Unlike traditional algorithms, the multi-relational proposals allow mining directly multiple tables, avoiding the costly join operations. In this paper, is presented a comparative study involving the traditional Patricia Mine algorithm and its corresponding multi-relational proposed, MR-Radix in order to evaluate the performance of two approaches for mining association rules are used for relational databases. This study presents two original contributions: the proposition of an algorithm multi-relational MR-Radix, which is efficient for use in relational databases, both in terms of execution time and in relation to memory usage and the presentation of the empirical approach multirelational advantage in performance over several tables, which avoids the costly join operations from multiple tables. © 2011 IEEE.
Resumo:
Multi-relational data mining enables pattern mining from multiple tables. The existing multi-relational mining association rules algorithms are not able to process large volumes of data, because the amount of memory required exceeds the amount available. The proposed algorithm MRRadix presents a framework that promotes the optimization of memory usage. It also uses the concept of partitioning to handle large volumes of data. The original contribution of this proposal is enable a superior performance when compared to other related algorithms and moreover successfully concludes the task of mining association rules in large databases, bypass the problem of available memory. One of the tests showed that the MR-Radix presents fourteen times less memory usage than the GFP-growth. © 2011 IEEE.
Resumo:
Aiming to ensure greater reliability and consistency of data stored in the database, the data cleaning stage is set early in the process of Knowledge Discovery in Databases (KDD) and is responsible for eliminating problems and adjust the data for the later stages, especially for the stage of data mining. Such problems occur in the instance level and schema, namely, missing values, null values, duplicate tuples, values outside the domain, among others. Several algorithms were developed to perform the cleaning step in databases, some of them were developed specifically to work with the phonetics of words, since a word can be written in different ways. Within this perspective, this work presents as original contribution an optimization of algorithm for the detection of duplicate tuples in databases through phonetic based on multithreading without the need for trained data, as well as an independent environment of language to be supported for this. © 2011 IEEE.
Resumo:
The development of new technologies that use peer-to-peer networks grows every day, with the object to supply the need of sharing information, resources and services of databases around the world. Among them are the peer-to-peer databases that take advantage of peer-to-peer networks to manage distributed knowledge bases, allowing the sharing of information semantically related but syntactically heterogeneous. However, it is a challenge to ensure the efficient search for information without compromising the autonomy of each node and network flexibility, given the structural characteristics of these networks. On the other hand, some studies propose the use of ontology semantics by assigning standardized categorization of information. The main original contribution of this work is the approach of this problem with a proposal for optimization of queries supported by the Ant Colony algorithm and classification though ontologies. The results show that this strategy enables the semantic support to the searches in peer-to-peer databases, aiming to expand the results without compromising network performance. © 2011 IEEE.
Resumo:
Non-conventional database management systems are used to achieve a better performance when dealing with complex data. One fundamental concept of these systems is object identity (OID), because each object in the database has a unique identifier that is used to access and reference it in relationships to other objects. Two approaches can be used for the implementation of OIDs: physical or logical OIDs. In order to manage complex data, was proposed the Multimedia Data Manager Kernel (NuGeM) that uses a logical technique, named Indirect Mapping. This paper proposes an improvement to the technique used by NuGeM, whose original contribution is management of OIDs with a fewer number of disc accesses and less processing, thus reducing management time from the pages and eliminating the problem with exhaustion of OIDs. Also, the technique presented here can be applied to others OODBMSs. © 2011 IEEE.
Resumo:
The significant volume of work accidents in the cities causes an expressive loss to society. The development of Spatial Data Mining technologies presents a new perspective for the extraction of knowledge from the correlation between conventional and spatial attributes. One of the most important techniques of the Spatial Data Mining is the Spatial Clustering, which clusters similar spatial objects to find a distribution of patterns, taking into account the geographical position of the objects. Applying this technique to the health area, will provide information that can contribute towards the planning of more adequate strategies for the prevention of work accidents. The original contribution of this work is to present an application of tools developed for Spatial Clustering which supply a set of graphic resources that have helped to discover knowledge and support for management in the work accidents area. © 2011 IEEE.