52 resultados para distributed computing projects
Resumo:
An important application of Big Data Analytics is the real-time analysis of streaming data. Streaming data imposes unique challenges to data mining algorithms, such as concept drifts, the need to analyse the data on the fly due to unbounded data streams and scalable algorithms due to potentially high throughput of data. Real-time classification algorithms that are adaptive to concept drifts and fast exist, however, most approaches are not naturally parallel and are thus limited in their scalability. This paper presents work on the Micro-Cluster Nearest Neighbour (MC-NN) classifier. MC-NN is based on an adaptive statistical data summary based on Micro-Clusters. MC-NN is very fast and adaptive to concept drift whilst maintaining the parallel properties of the base KNN classifier. Also MC-NN is competitive compared with existing data stream classifiers in terms of accuracy and speed.
Resumo:
Replacement and upgrading of assets in the electricity network requires financial investment for the distribution and transmission utilities. The replacement and upgrading of network assets also represents an emissions impact due to the carbon embodied in the materials used to manufacture network assets. This paper uses investment and asset data for the GB system for 2015-2023 to assess the suitability of using a proxy with peak demand data and network investment data to calculate the carbon impacts of network investments. The proxies are calculated on a regional basis and applied to calculate the embodied carbon associated with current network assets by DNO region. The proxies are also applied to peak demand data across the 2015-2023 period to estimate the expected levels of embodied carbon that will be associated with network investment during this period. The suitability of these proxies in different contexts are then discussed, along with initial scenario analysis to calculate the impact of avoiding or deferring network investments through distributed generation projects. The proxies were found to be effective in estimating the total embodied carbon of electricity system investment in order to compare investment strategies in different regions of the GB network.
Resumo:
Epidemic protocols are a bio-inspired communication and computation paradigm for extreme-scale network system based on randomized communication. The protocols rely on a membership service to build decentralized and random overlay topologies. In a weakly connected overlay topology, a naive mechanism of membership protocols can break the connectivity, thus impairing the accuracy of the application. This work investigates the factors in membership protocols that cause the loss of global connectivity and introduces the first topology connectivity recovery mechanism. The mechanism is integrated into the Expander Membership Protocol, which is then evaluated against other membership protocols. The analysis shows that the proposed connectivity recovery mechanism is effective in preserving topology connectivity and also helps to improve the application performance in terms of convergence speed.
Resumo:
The P-found protein folding and unfolding simulation repository is designed to allow scientists to perform analyses across large, distributed simulation data sets. There are two storage components in P-found: a primary repository of simulation data and a data warehouse. Here we demonstrate how grid technologies can support multiple, distributed P-found installations. In particular we look at two aspects, first how grid data management technologies can be used to access the distributed data warehouses; and secondly, how the grid can be used to transfer analysis programs to the primary repositories --- this is an important and challenging aspect of P-found because the data volumes involved are too large to be centralised. The grid technologies we are developing with the P-found system will allow new large data sets of protein folding simulations to be accessed and analysed in novel ways, with significant potential for enabling new scientific discoveries.
Resumo:
Medical universities and teaching hospitals in Iraq are facing a lack of professional staff due to the ongoing violence that forces them to flee the country. The professionals are now distributed outside the country which reduces the chances for the staff and students to be physically in one place to continue the teaching and limits the efficiency of the consultations in hospitals. A survey was done among students and professional staff in Iraq to find the problems in the learning and clinical systems and how Information and Communication Technology could improve it. The survey has shown that 86% of the participants use the Internet as a learning resource and 25% for clinical purposes while less than 11% of them uses it for collaboration between different institutions. A web-based collaborative tool is proposed to improve the teaching and clinical system. The tool helps the users to collaborate remotely to increase the quality of the learning system as well as it can be used for remote medical consultation in hospitals.
Resumo:
We present a general Multi-Agent System framework for distributed data mining based on a Peer-to-Peer model. Agent protocols are implemented through message-based asynchronous communication. The framework adopts a dynamic load balancing policy that is particularly suitable for irregular search algorithms. A modular design allows a separation of the general-purpose system protocols and software components from the specific data mining algorithm. The experimental evaluation has been carried out on a parallel frequent subgraph mining algorithm, which has shown good scalability performances.
Resumo:
Recently, two approaches have been introduced that distribute the molecular fragment mining problem. The first approach applies a master/worker topology, the second approach, a completely distributed peer-to-peer system, solves the scalability problem due to the bottleneck at the master node. However, in many real world scenarios the participating computing nodes cannot communicate directly due to administrative policies such as security restrictions. Thus, potential computing power is not accessible to accelerate the mining run. To solve this shortcoming, this work introduces a hierarchical topology of computing resources, which distributes the management over several levels and adapts to the natural structure of those multi-domain architectures. The most important aspect is the load balancing scheme, which has been designed and optimized for the hierarchical structure. The approach allows dynamic aggregation of heterogenous computing resources and is applied to wide area network scenarios.
Resumo:
In real world applications sequential algorithms of data mining and data exploration are often unsuitable for datasets with enormous size, high-dimensionality and complex data structure. Grid computing promises unprecedented opportunities for unlimited computing and storage resources. In this context there is the necessity to develop high performance distributed data mining algorithms. However, the computational complexity of the problem and the large amount of data to be explored often make the design of large scale applications particularly challenging. In this paper we present the first distributed formulation of a frequent subgraph mining algorithm for discriminative fragments of molecular compounds. Two distributed approaches have been developed and compared on the well known National Cancer Institute’s HIV-screening dataset. We present experimental results on a small-scale computing environment.
Resumo:
Space applications demand the need for building reliable systems. Autonomic computing defines such reliable systems as self-managing systems. The work reported in this paper combines agent-based and swarm robotic approaches leading to swarm-array computing, a novel technique to achieve self-managing distributed parallel computing systems. Two swarm-array computing approaches based on swarms of computational resources and swarms of tasks are explored. FPGA is considered as the computing system. The feasibility of the two proposed approaches that binds the computing system and the task together is simulated on the SeSAm multi-agent simulator.
Resumo:
Space applications demand the need for building reliable systems. Autonomic computing defines such reliable systems as self-managing systems. The work reported in this paper combines agent-based and swarm robotic approaches leading to swarm-array computing, a novel technique to achieve self-managing distributed parallel computing systems. Two swarm-array computing approaches based on swarms of computational resources and swarms of tasks are explored. FPGA is considered as the computing system. The feasibility of the two proposed approaches that binds the computing system and the task together is simulated on the SeSAm multi-agent simulator.
Resumo:
Synchronous collaborative systems allow geographically distributed participants to form a virtual work environment enabling cooperation between peers and enriching the human interaction. The technology facilitating this interaction has been studied for several years and various solutions can be found at present. In this paper, we discuss our experiences with one such widely adopted technology, namely the Access Grid. We describe our experiences with using this technology, identify key problem areas and propose our solution to tackle these issues appropriately. Moreover, we propose the integration of Access Grid with an Application Sharing tool, developed by the authors. Our approach allows these integrated tools to utilise the enhanced features provided by our underlying dynamic transport layer.
Resumo:
How can a bridge be built between autonomic computing approaches and parallel computing systems? The work reported in this paper is motivated towards bridging this gap by proposing a swarm-array computing approach based on ‘Intelligent Agents’ to achieve autonomy for distributed parallel computing systems. In the proposed approach, a task to be executed on parallel computing cores is carried onto a computing core by carrier agents that can seamlessly transfer between processing cores in the event of a predicted failure. The cognitive capabilities of the carrier agents on a parallel processing core serves in achieving the self-ware objectives of autonomic computing, hence applying autonomic computing concepts for the benefit of parallel computing systems. The feasibility of the proposed approach is validated by simulation studies using a multi-agent simulator on an FPGA (Field-Programmable Gate Array) and experimental studies using MPI (Message Passing Interface) on a computer cluster. Preliminary results confirm that applying autonomic computing principles to parallel computing systems is beneficial.
Resumo:
Clusters of computers can be used together to provide a powerful computing resource. Large Monte Carlo simulations, such as those used to model particle growth, are computationally intensive and take considerable time to execute on conventional workstations. By spreading the work of the simulation across a cluster of computers, the elapsed execution time can be greatly reduced. Thus a user has apparently the performance of a supercomputer by using the spare cycles on other workstations.