967 resultados para Parallel or distributed processing


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Advances in hardware and software technology enable us to collect, store and distribute large quantities of data on a very large scale. Automatically discovering and extracting hidden knowledge in the form of patterns from these large data volumes is known as data mining. Data mining technology is not only a part of business intelligence, but is also used in many other application areas such as research, marketing and financial analytics. For example medical scientists can use patterns extracted from historic patient data in order to determine if a new patient is likely to respond positively to a particular treatment or not; marketing analysts can use extracted patterns from customer data for future advertisement campaigns; finance experts have an interest in patterns that forecast the development of certain stock market shares for investment recommendations. However, extracting knowledge in the form of patterns from massive data volumes imposes a number of computational challenges in terms of processing time, memory, bandwidth and power consumption. These challenges have led to the development of parallel and distributed data analysis approaches and the utilisation of Grid and Cloud computing. This chapter gives an overview of parallel and distributed computing approaches and how they can be used to scale up data mining to large datasets.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

One common drawback in algorithms for learning Linear Causal Models is that they can not deal with incomplete data set. This is unfortunate since many real problems involve missing data or even hidden variable. In this paper, based on multiple imputation, we propose a three-step process to learn linear causal models from incomplete data set. Experimental results indicate that this algorithm is better than the single imputation method (EM algorithm) and the simple list deletion method, and for lower missing rate, this algorithm can even find models better than the results from the greedy learning algorithm MLGS working in a complete data set. In addition, the method is amenable to parallel or distributed processing, which is an important characteristic for data mining in large data sets.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Parameter Estimation is one of the key issues involved in the discovery of graphical models from data. Current state of the art methods have demonstrated their abilities in different kind of graphical models. In this paper, we introduce ensemble learning into the process of parameter estimation, and examine ensemble parameter estimation methods for different kind of graphical models under complete data set and incomplete data set. We provide experimental results which show that ensemble method can achieve an improved result over the base parameter estimation method in terms of accuracy. In addition, the method is amenable to parallel or distributed processing, which is an important characteristic for data mining in large data sets.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In the last 30 to 40 years, many researchers have combined to build the knowledge base of theory and solution techniques that can be applied to the case of differential equations which include the effects of noise. This class of ``noisy'' differential equations is now known as stochastic differential equations (SDEs). Markov diffusion processes are included within the field of SDEs through the drift and diffusion components of the Itô form of an SDE. When these drift and diffusion components are moderately smooth functions, then the processes' transition probability densities satisfy the Fokker-Planck-Kolmogorov (FPK) equation -- an ordinary partial differential equation (PDE). Thus there is a mathematical inter-relationship that allows solutions of SDEs to be determined from the solution of a noise free differential equation which has been extensively studied since the 1920s. The main numerical solution technique employed to solve the FPK equation is the classical Finite Element Method (FEM). The FEM is of particular importance to engineers when used to solve FPK systems that describe noisy oscillators. The FEM is a powerful tool but is limited in that it is cumbersome when applied to multidimensional systems and can lead to large and complex matrix systems with their inherent solution and storage problems. I show in this thesis that the stochastic Taylor series (TS) based time discretisation approach to the solution of SDEs is an efficient and accurate technique that provides transition and steady state solutions to the associated FPK equation. The TS approach to the solution of SDEs has certain advantages over the classical techniques. These advantages include their ability to effectively tackle stiff systems, their simplicity of derivation and their ease of implementation and re-use. Unlike the FEM approach, which is difficult to apply in even only two dimensions, the simplicity of the TS approach is independant of the dimension of the system under investigation. Their main disadvantage, that of requiring a large number of simulations and the associated CPU requirements, is countered by their underlying structure which makes them perfectly suited for use on the now prevalent parallel or distributed processing systems. In summary, l will compare the TS solution of SDEs to the solution of the associated FPK equations using the classical FEM technique. One, two and three dimensional FPK systems that describe noisy oscillators have been chosen for the analysis. As higher dimensional FPK systems are rarely mentioned in the literature, the TS approach will be extended to essentially infinite dimensional systems through the solution of stochastic PDEs. In making these comparisons, the advantages of modern computing tools such as computer algebra systems and simulation software, when used as an adjunct to the solution of SDEs or their associated FPK equations, are demonstrated.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The Optimum-Path Forest (OPF) classifier is a recent and promising method for pattern recognition, with a fast training algorithm and good accuracy results. Therefore, the investigation of a combining method for this kind of classifier can be important for many applications. In this paper we report a fast method to combine OPF-based classifiers trained with disjoint training subsets. Given a fixed number of subsets, the algorithm chooses random samples, without replacement, from the original training set. Each subset accuracy is improved by a learning procedure. The final decision is given by majority vote. Experiments with simulated and real data sets showed that the proposed combining method is more efficient and effective than naive approach provided some conditions. It was also showed that OPF training step runs faster for a series of small subsets than for the whole training set. The combining scheme was also designed to support parallel or distributed processing, speeding up the procedure even more. © 2011 Springer-Verlag.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

An approach to the management of non-functional concerns in massively parallel and/or distributed architectures that marries parallel programming patterns with autonomic computing is presented. The necessity and suitability of the adoption of autonomic techniques are evidenced. Issues arising in the implementation of autonomic managers taking care of multiple concerns and of coordination among hierarchies of such autonomic managers are discussed. Experimental results are presented that demonstrate the feasibility of the approach.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The manipulation and handling of an ever increasing volume of data by current data-intensive applications require novel techniques for e?cient data management. Despite recent advances in every aspect of data management (storage, access, querying, analysis, mining), future applications are expected to scale to even higher degrees, not only in terms of volumes of data handled but also in terms of users and resources, often making use of multiple, pre-existing autonomous, distributed or heterogeneous resources.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The paper describes education complex "Multi-agent Technologies for Parallel and Distributed Information Processing in Telecommunication Networks".

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Embedded real-time applications increasingly present high computation requirements, which need to be completed within specific deadlines, but that present highly variable patterns, depending on the set of data available in a determined instant. The current trend to provide parallel processing in the embedded domain allows providing higher processing power; however, it does not address the variability in the processing pattern. Dimensioning each device for its worst-case scenario implies lower average utilization, and increased available, but unusable, processing in the overall system. A solution for this problem is to extend the parallel execution of the applications, allowing networked nodes to distribute the workload, on peak situations, to neighbour nodes. In this context, this report proposes a framework to develop parallel and distributed real-time embedded applications, transparently using OpenMP and Message Passing Interface (MPI), within a programming model based on OpenMP. The technical report also devises an integrated timing model, which enables the structured reasoning on the timing behaviour of these hybrid architectures.