105 resultados para Sparse distributed memory
em CentAUR: Central Archive University of Reading - UK
Resumo:
One among the most influential and popular data mining methods is the k-Means algorithm for cluster analysis. Techniques for improving the efficiency of k-Means have been largely explored in two main directions. The amount of computation can be significantly reduced by adopting geometrical constraints and an efficient data structure, notably a multidimensional binary search tree (KD-Tree). These techniques allow to reduce the number of distance computations the algorithm performs at each iteration. A second direction is parallel processing, where data and computation loads are distributed over many processing nodes. However, little work has been done to provide a parallel formulation of the efficient sequential techniques based on KD-Trees. Such approaches are expected to have an irregular distribution of computation load and can suffer from load imbalance. This issue has so far limited the adoption of these efficient k-Means variants in parallel computing environments. In this work, we provide a parallel formulation of the KD-Tree based k-Means algorithm for distributed memory systems and address its load balancing issue. Three solutions have been developed and tested. Two approaches are based on a static partitioning of the data set and a third solution incorporates a dynamic load balancing policy.
Resumo:
An n-dimensional Mobius cube, 0MQ(n) or 1MQ(n), is a variation of n-dimensional cube Q(n) which possesses many attractive properties such as significantly smaller communication delay and stronger graph-embedding capabilities. In some practical situations, the fault tolerance of a distributed memory multiprocessor system can be measured more precisely by the connectivity of the underlying graph under forbidden fault set models. This article addresses the connectivity of 0MQ(n)/1MQ(n), under two typical forbidden fault set models. We first prove that the connectivity of 0MQ(n)/1MQ(n) is 2n - 2 when the fault set does not contain the neighborhood of any vertex as a subset. We then prove that the connectivity of 0MQ(n)/1MQ(n) is 3n - 5 provided that the neighborhood of any vertex as well as that of any edge cannot fail simultaneously These results demonstrate that 0MQ(n)/1MQ(n) has the same connectivity as Q(n) under either of the previous assumptions.
Resumo:
With the transition to multicore processors almost complete, the parallel processing community is seeking efficient ways to port legacy message passing applications on shared memory and multicore processors. MPJ Express is our reference implementation of Message Passing Interface (MPI)-like bindings for the Java language. Starting with the current release, the MPJ Express software can be configured in two modes: the multicore and the cluster mode. In the multicore mode, parallel Java applications execute on shared memory or multicore processors. In the cluster mode, Java applications parallelized using MPJ Express can be executed on distributed memory platforms like compute clusters and clouds. The multicore device has been implemented using Java threads in order to satisfy two main design goals of portability and performance. We also discuss the challenges of integrating the multicore device in the MPJ Express software. This turned out to be a challenging task because the parallel application executes in a single JVM in the multicore mode. On the contrary in the cluster mode, the parallel user application executes in multiple JVMs. Due to these inherent architectural differences between the two modes, the MPJ Express runtime is modified to ensure correct semantics of the parallel program. Towards the end, we compare performance of MPJ Express (multicore mode) with other C and Java message passing libraries---including mpiJava, MPJ/Ibis, MPICH2, MPJ Express (cluster mode)---on shared memory and multicore processors. We found out that MPJ Express performs signicantly better in the multicore mode than in the cluster mode. Not only this but the MPJ Express software also performs better in comparison to other Java messaging libraries including mpiJava and MPJ/Ibis when used in the multicore mode on shared memory or multicore processors. We also demonstrate effectiveness of the MPJ Express multicore device in Gadget-2, which is a massively parallel astrophysics N-body siimulation code.
Resumo:
The K-Means algorithm for cluster analysis is one of the most influential and popular data mining methods. Its straightforward parallel formulation is well suited for distributed memory systems with reliable interconnection networks. However, in large-scale geographically distributed systems the straightforward parallel algorithm can be rendered useless by a single communication failure or high latency in communication paths. This work proposes a fully decentralised algorithm (Epidemic K-Means) which does not require global communication and is intrinsically fault tolerant. The proposed distributed K-Means algorithm provides a clustering solution which can approximate the solution of an ideal centralised algorithm over the aggregated data as closely as desired. A comparative performance analysis is carried out against the state of the art distributed K-Means algorithms based on sampling methods. The experimental analysis confirms that the proposed algorithm is a practical and accurate distributed K-Means implementation for networked systems of very large and extreme scale.
Resumo:
The K-Means algorithm for cluster analysis is one of the most influential and popular data mining methods. Its straightforward parallel formulation is well suited for distributed memory systems with reliable interconnection networks, such as massively parallel processors and clusters of workstations. However, in large-scale geographically distributed systems the straightforward parallel algorithm can be rendered useless by a single communication failure or high latency in communication paths. The lack of scalable and fault tolerant global communication and synchronisation methods in large-scale systems has hindered the adoption of the K-Means algorithm for applications in large networked systems such as wireless sensor networks, peer-to-peer systems and mobile ad hoc networks. This work proposes a fully distributed K-Means algorithm (EpidemicK-Means) which does not require global communication and is intrinsically fault tolerant. The proposed distributed K-Means algorithm provides a clustering solution which can approximate the solution of an ideal centralised algorithm over the aggregated data as closely as desired. A comparative performance analysis is carried out against the state of the art sampling methods and shows that the proposed method overcomes the limitations of the sampling-based approaches for skewed clusters distributions. The experimental analysis confirms that the proposed algorithm is very accurate and fault tolerant under unreliable network conditions (message loss and node failures) and is suitable for asynchronous networks of very large and extreme scale.
Resumo:
In this paper, we study the periodic oscillatory behavior of a class of bidirectional associative memory (BAM) networks with finite distributed delays. A set of criteria are proposed for determining global exponential periodicity of the proposed BAM networks, which assume neither differentiability nor monotonicity of the activation function of each neuron. In addition, our criteria are easily checkable. (c) 2005 Elsevier Inc. All rights reserved.
Resumo:
This article reviews current technological developments, particularly Peer-to-Peer technologies and Distributed Data Systems, and their value to community memory projects, particularly those concerned with the preservation of the cultural, literary and administrative data of cultures which have suffered genocide or are at risk of genocide. It draws attention to the comparatively good representation online of genocide denial groups and changes in the technological strategies of holocaust denial and other far-right groups. It draws on the author's work in providing IT support for a UK-based Non-Governmental Organization providing support for survivors of genocide in Rwanda.
Resumo:
Advances in hardware and software technology enable us to collect, store and distribute large quantities of data on a very large scale. Automatically discovering and extracting hidden knowledge in the form of patterns from these large data volumes is known as data mining. Data mining technology is not only a part of business intelligence, but is also used in many other application areas such as research, marketing and financial analytics. For example medical scientists can use patterns extracted from historic patient data in order to determine if a new patient is likely to respond positively to a particular treatment or not; marketing analysts can use extracted patterns from customer data for future advertisement campaigns; finance experts have an interest in patterns that forecast the development of certain stock market shares for investment recommendations. However, extracting knowledge in the form of patterns from massive data volumes imposes a number of computational challenges in terms of processing time, memory, bandwidth and power consumption. These challenges have led to the development of parallel and distributed data analysis approaches and the utilisation of Grid and Cloud computing. This chapter gives an overview of parallel and distributed computing approaches and how they can be used to scale up data mining to large datasets.
Resumo:
The goal of this work is the efficient solution of the heat equation with Dirichlet or Neumann boundary conditions using the Boundary Elements Method (BEM). Efficiently solving the heat equation is useful, as it is a simple model problem for other types of parabolic problems. In complicated spatial domains as often found in engineering, BEM can be beneficial since only the boundary of the domain has to be discretised. This makes BEM easier than domain methods such as finite elements and finite differences, conventionally combined with time-stepping schemes to solve this problem. The contribution of this work is to further decrease the complexity of solving the heat equation, leading both to speed gains (in CPU time) as well as requiring smaller amounts of memory to solve the same problem. To do this we will combine the complexity gains of boundary reduction by integral equation formulations with a discretisation using wavelet bases. This reduces the total work to O(h
Resumo:
At its most fundamental, cognition as displayed by biological agents (such as humans) may be said to consist of the manipulation and utilisation of memory. Recent discussions in the field of cognitive robotics have emphasised the role of embodiment and the necessity of a value or motivation for autonomous behaviour. This work proposes a computational architecture – the Memory-Based Cognitive (MBC) architecture – based upon these considerations for the autonomous development of control of a simple mobile robot. This novel architecture will permit the exploration of theoretical issues in cognitive robotics and animal cognition. Furthermore, the biological inspiration of the architecture is anticipated to result in a mobile robot controller which displays adaptive behaviour in unknown environments.
Resumo:
We use an empirical statistical model to demonstrate significant skill in making extended-range forecasts of the monthly-mean Arctic Oscillation (AO). Forecast skill derives from persistent circulation anomalies in the lowermost stratosphere and is greatest during boreal winter. A comparison to the Southern Hemisphere provides evidence that both the time scale and predictability of the AO depend on the presence of persistent circulation anomalies just above the tropopause. These circulation anomalies most likely affect the troposphere through changes to waves in the upper troposphere, which induce surface pressure changes that correspond to the AO.
Resumo:
There are at least three distinct time scales that are relevant for the evolution of atmospheric convection. These are the time scale of the forcing mechanism, the time scale governing the response to a steady forcing, and the time scale of the response to variations in the forcing. The last of these, tmem, is associated with convective life cycles, which provide an element of memory in the system. A highly simplified model of convection is introduced, which allows for investigation of the character of convection as a function of the three time scales. For short tmem, the convective response is strongly tied to the forcing as in conventional equilibrium parameterization. For long tmem, the convection responds only to the slowly evolving component of forcing, and any fluctuations in the forcing are essentially suppressed. At intermediate tmem, convection becomes less predictable: conventional equilibrium closure breaks down and current levels of convection modify the subsequent response.