283 resultados para mining algorithm


Relevância:

20.00% 20.00%

Publicador:

Resumo:

The K-Means algorithm for cluster analysis is one of the most influential and popular data mining methods. Its straightforward parallel formulation is well suited for distributed memory systems with reliable interconnection networks. However, in large-scale geographically distributed systems the straightforward parallel algorithm can be rendered useless by a single communication failure or high latency in communication paths. This work proposes a fully decentralised algorithm (Epidemic K-Means) which does not require global communication and is intrinsically fault tolerant. The proposed distributed K-Means algorithm provides a clustering solution which can approximate the solution of an ideal centralised algorithm over the aggregated data as closely as desired. A comparative performance analysis is carried out against the state of the art distributed K-Means algorithms based on sampling methods. The experimental analysis confirms that the proposed algorithm is a practical and accurate distributed K-Means implementation for networked systems of very large and extreme scale.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Recent research has shown that Lighthill–Ford spontaneous gravity wave generation theory, when applied to numerical model data, can help predict areas of clear-air turbulence. It is hypothesized that this is the case because spontaneously generated atmospheric gravity waves may initiate turbulence by locally modifying the stability and wind shear. As an improvement on the original research, this paper describes the creation of an ‘operational’ algorithm (ULTURB) with three modifications to the original method: (1) extending the altitude range for which the method is effective downward to the top of the boundary layer, (2) adding turbulent kinetic energy production from the environment to the locally produced turbulent kinetic energy production, and, (3) transforming turbulent kinetic energy dissipation to eddy dissipation rate, the turbulence metric becoming the worldwide ‘standard’. In a comparison of ULTURB with the original method and with the Graphical Turbulence Guidance second version (GTG2) automated procedure for forecasting mid- and upper-level aircraft turbulence ULTURB performed better for all turbulence intensities. Since ULTURB, unlike GTG2, is founded on a self-consistent dynamical theory, it may offer forecasters better insight into the causes of the clear-air turbulence and may ultimately enhance its predictability.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper contributes to the debate on child labor in small-scale mining communities, focusing specifically on the situation in sub-Saharan Africa. It argues that the child labor now widespread in many of the region’s small-scale mining communities is a product of a combination of cultural issues, household-level poverty and rural livelihood diversification. Experiences from Komana West, a subsistence gold panning area in Southern Mali, are drawn upon to make this case. The findings suggest that the sector’s child labor “problem” is far more nuanced than international organizations and policymakers have diagnosed.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper critically reflects on why, in many rural stretches of sub-Saharan Africa, scores of people engage in artisanal and small-scale mining (ASM) activity – low-tech, labour intensive mineral extraction – for lengthy periods of time. It argues that a large share of the region’s ASM operators have mounting debts which prevent them from pursuing alternative, less arduous, employment. The paper concludes with an analysis of findings from research carried out by the author in Talensi-Nabdam District, Northern Ghana, which captures the essence of the poverty trap now plaguing so many ASM communities in sub-Saharan Africa.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper we propose an efficient two-level model identification method for a large class of linear-in-the-parameters models from the observational data. A new elastic net orthogonal forward regression (ENOFR) algorithm is employed at the lower level to carry out simultaneous model selection and elastic net parameter estimation. The two regularization parameters in the elastic net are optimized using a particle swarm optimization (PSO) algorithm at the upper level by minimizing the leave one out (LOO) mean square error (LOOMSE). Illustrative examples are included to demonstrate the effectiveness of the new approaches.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Artisanal and small-scale mining (ASM) is replacing smallholder farming as the principal income source in parts of rural Ghana. Structural adjustment policies have removed support for the country’s smallholders, devalued their produce substantially and stiffened competition with large-scale counterparts. Over one million people nationwide are now engaged in ASM. Findings from qualitative research in Ghana’s Eastern Region are drawn upon to improve understanding of the factors driving this pattern of rural livelihood diversification. The ASM sector and farming are shown to be complementary, contrary to common depictions in policy and academic literature.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The K-Means algorithm for cluster analysis is one of the most influential and popular data mining methods. Its straightforward parallel formulation is well suited for distributed memory systems with reliable interconnection networks, such as massively parallel processors and clusters of workstations. However, in large-scale geographically distributed systems the straightforward parallel algorithm can be rendered useless by a single communication failure or high latency in communication paths. The lack of scalable and fault tolerant global communication and synchronisation methods in large-scale systems has hindered the adoption of the K-Means algorithm for applications in large networked systems such as wireless sensor networks, peer-to-peer systems and mobile ad hoc networks. This work proposes a fully distributed K-Means algorithm (EpidemicK-Means) which does not require global communication and is intrinsically fault tolerant. The proposed distributed K-Means algorithm provides a clustering solution which can approximate the solution of an ideal centralised algorithm over the aggregated data as closely as desired. A comparative performance analysis is carried out against the state of the art sampling methods and shows that the proposed method overcomes the limitations of the sampling-based approaches for skewed clusters distributions. The experimental analysis confirms that the proposed algorithm is very accurate and fault tolerant under unreliable network conditions (message loss and node failures) and is suitable for asynchronous networks of very large and extreme scale.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

OBJECTIVES: The prediction of protein structure and the precise understanding of protein folding and unfolding processes remains one of the greatest challenges in structural biology and bioinformatics. Computer simulations based on molecular dynamics (MD) are at the forefront of the effort to gain a deeper understanding of these complex processes. Currently, these MD simulations are usually on the order of tens of nanoseconds, generate a large amount of conformational data and are computationally expensive. More and more groups run such simulations and generate a myriad of data, which raises new challenges in managing and analyzing these data. Because the vast range of proteins researchers want to study and simulate, the computational effort needed to generate data, the large data volumes involved, and the different types of analyses scientists need to perform, it is desirable to provide a public repository allowing researchers to pool and share protein unfolding data. METHODS: To adequately organize, manage, and analyze the data generated by unfolding simulation studies, we designed a data warehouse system that is embedded in a grid environment to facilitate the seamless sharing of available computer resources and thus enable many groups to share complex molecular dynamics simulations on a more regular basis. RESULTS: To gain insight into the conformational fluctuations and stability of the monomeric forms of the amyloidogenic protein transthyretin (TTR), molecular dynamics unfolding simulations of the monomer of human TTR have been conducted. Trajectory data and meta-data of the wild-type (WT) protein and the highly amyloidogenic variant L55P-TTR represent the test case for the data warehouse. CONCLUSIONS: Web and grid services, especially pre-defined data mining services that can run on or 'near' the data repository of the data warehouse, are likely to play a pivotal role in the analysis of molecular dynamics unfolding data.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In a world where massive amounts of data are recorded on a large scale we need data mining technologies to gain knowledge from the data in a reasonable time. The Top Down Induction of Decision Trees (TDIDT) algorithm is a very widely used technology to predict the classification of newly recorded data. However alternative technologies have been derived that often produce better rules but do not scale well on large datasets. Such an alternative to TDIDT is the PrismTCS algorithm. PrismTCS performs particularly well on noisy data but does not scale well on large datasets. In this paper we introduce Prism and investigate its scaling behaviour. We describe how we improved the scalability of the serial version of Prism and investigate its limitations. We then describe our work to overcome these limitations by developing a framework to parallelise algorithms of the Prism family and similar algorithms. We also present the scale up results of a first prototype implementation.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The Distributed Rule Induction (DRI) project at the University of Portsmouth is concerned with distributed data mining algorithms for automatically generating rules of all kinds. In this paper we present a system architecture and its implementation for inducing modular classification rules in parallel in a local area network using a distributed blackboard system. We present initial results of a prototype implementation based on the Prism algorithm.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Pocket Data Mining (PDM) is our new term describing collaborative mining of streaming data in mobile and distributed computing environments. With sheer amounts of data streams are now available for subscription on our smart mobile phones, the potential of using this data for decision making using data stream mining techniques has now been achievable owing to the increasing power of these handheld devices. Wireless communication among these devices using Bluetooth and WiFi technologies has opened the door wide for collaborative mining among the mobile devices within the same range that are running data mining techniques targeting the same application. This paper proposes a new architecture that we have prototyped for realizing the significant applications in this area. We have proposed using mobile software agents in this application for several reasons. Most importantly the autonomic intelligent behaviour of the agent technology has been the driving force for using it in this application. Other efficiency reasons are discussed in details in this paper. Experimental results showing the feasibility of the proposed architecture are presented and discussed.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Induction of classification rules is one of the most important technologies in data mining. Most of the work in this field has concentrated on the Top Down Induction of Decision Trees (TDIDT) approach. However, alternative approaches have been developed such as the Prism algorithm for inducing modular rules. Prism often produces qualitatively better rules than TDIDT but suffers from higher computational requirements. We investigate approaches that have been developed to minimize the computational requirements of TDIDT, in order to find analogous approaches that could reduce the computational requirements of Prism.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Collaborative mining of distributed data streams in a mobile computing environment is referred to as Pocket Data Mining PDM. Hoeffding trees techniques have been experimentally and analytically validated for data stream classification. In this paper, we have proposed, developed and evaluated the adoption of distributed Hoeffding trees for classifying streaming data in PDM applications. We have identified a realistic scenario in which different users equipped with smart mobile devices run a local Hoeffding tree classifier on a subset of the attributes. Thus, we have investigated the mining of vertically partitioned datasets with possible overlap of attributes, which is the more likely case. Our experimental results have validated the efficiency of our proposed model achieving promising accuracy for real deployment.