283 resultados para mining algorithm
Resumo:
Distributed and collaborative data stream mining in a mobile computing environment is referred to as Pocket Data Mining PDM. Large amounts of available data streams to which smart phones can subscribe to or sense, coupled with the increasing computational power of handheld devices motivates the development of PDM as a decision making system. This emerging area of study has shown to be feasible in an earlier study using technological enablers of mobile software agents and stream mining techniques [1]. A typical PDM process would start by having mobile agents roam the network to discover relevant data streams and resources. Then other (mobile) agents encapsulating stream mining techniques visit the relevant nodes in the network in order to build evolving data mining models. Finally, a third type of mobile agents roam the network consulting the mining agents for a final collaborative decision, when required by one or more users. In this paper, we propose the use of distributed Hoeffding trees and Naive Bayes classifers in the PDM framework over vertically partitioned data streams. Mobile policing, health monitoring and stock market analysis are among the possible applications of PDM. An extensive experimental study is reported showing the effectiveness of the collaborative data mining with the two classifers.
Resumo:
Pocket Data Mining (PDM) describes the full process of analysing data streams in mobile ad hoc distributed environments. Advances in mobile devices like smart phones and tablet computers have made it possible for a wide range of applications to run in such an environment. In this paper, we propose the adoption of data stream classification techniques for PDM. Evident by a thorough experimental study, it has been proved that running heterogeneous/different, or homogeneous/similar data stream classification techniques over vertically partitioned data (data partitioned according to the feature space) results in comparable performance to batch and centralised learning techniques.
Resumo:
In the recent years, the area of data mining has been experiencing considerable demand for technologies that extract knowledge from large and complex data sources. There has been substantial commercial interest as well as active research in the area that aim to develop new and improved approaches for extracting information, relationships, and patterns from large datasets. Artificial neural networks (NNs) are popular biologically-inspired intelligent methodologies, whose classification, prediction, and pattern recognition capabilities have been utilized successfully in many areas, including science, engineering, medicine, business, banking, telecommunication, and many other fields. This paper highlights from a data mining perspective the implementation of NN, using supervised and unsupervised learning, for pattern recognition, classification, prediction, and cluster analysis, and focuses the discussion on their usage in bioinformatics and financial data analysis tasks. © 2012 Wiley Periodicals, Inc.
Resumo:
The P-found protein folding and unfolding simulation repository is designed to allow scientists to perform data mining and other analyses across large, distributed simulation data sets. There are two storage components in P-found: a primary repository of simulation data that is used to populate the second component, and a data warehouse that contains important molecular properties. These properties may be used for data mining studies. Here we demonstrate how grid technologies can support multiple, distributed P-found installations. In particular, we look at two aspects: firstly, how grid data management technologies can be used to access the distributed data warehouses; and secondly, how the grid can be used to transfer analysis programs to the primary repositories — this is an important and challenging aspect of P-found, due to the large data volumes involved and the desire of scientists to maintain control of their own data. The grid technologies we are developing with the P-found system will allow new large data sets of protein folding simulations to be accessed and analysed in novel ways, with significant potential for enabling scientific discovery.
Resumo:
This contribution introduces a new digital predistorter to compensate serious distortions caused by memory high power amplifiers (HPAs) which exhibit output saturation characteristics. The proposed design is based on direct learning using a data-driven B-spline Wiener system modeling approach. The nonlinear HPA with memory is first identified based on the B-spline neural network model using the Gauss-Newton algorithm, which incorporates the efficient De Boor algorithm with both B-spline curve and first derivative recursions. The estimated Wiener HPA model is then used to design the Hammerstein predistorter. In particular, the inverse of the amplitude distortion of the HPA's static nonlinearity can be calculated effectively using the Newton-Raphson formula based on the inverse of De Boor algorithm. A major advantage of this approach is that both the Wiener HPA identification and the Hammerstein predistorter inverse can be achieved very efficiently and accurately. Simulation results obtained are presented to demonstrate the effectiveness of this novel digital predistorter design.
Resumo:
This article clarifies what was done with the sub-7-man positions in data-mining Harold van der Heijden's 'HHdbIV' database of chess studies prior to its publication. It emphasises that only positions in the main lines of studies were examined and that the information about uniqueness of move was not incorporated in HHdbIV. There is some reflection on the separate technical and artistic dimensions of study evaluation.
Resumo:
This article explores the contribution that artisanal and small-scale mining (ASM) makes to poverty reduction in Tanzania, based on data on gold and diamond mining in Mwanza Region. The evidence suggests that people working in mining or related services are less likely to be in poverty than those with other occupations. However, the picture is complex; while mining income can help reduce poverty and provide a buffer from livelihood shocks, peoples inability to obtain a formal mineral claim, or to effectively exploit their claims, contributes to insecurity. This is reinforced by a context in which ASM is peripheral to large-scale mining interests, is only gradually being addressed within national poverty reduction policies, and is segregated from district-level planning.
Resumo:
This article discusses the character of mineral resource governance at the margins of the state in Tanzania and the way artisanal gold miners are incorporated into mineral sector transformation. The landscape of mineral resource exploitation has changed dramatically over the past 20 years: processes of economic liberalisation have heralded massive foreign investment in large-scale gold mining, while also stimulating artisanal activities. Against this background, the article shows how artisanal gold miners are affected by contradictory processes: some have become integrated with state institutions and legal processes, while others, the large majority, are either further excluded or incorporated in ways that exacerbate insecurity and exploitation, underpinned by socio-economic inequalities. These processes are compounded by the actions of large-scale and medium-scale gold mining companies and by poor local governance. It is open to debate whether this will bring improved integration and welfare for artisanal mining communities or new forms of exclusion, although evidence suggests the latter.
Resumo:
Advances in hardware and software technology enable us to collect, store and distribute large quantities of data on a very large scale. Automatically discovering and extracting hidden knowledge in the form of patterns from these large data volumes is known as data mining. Data mining technology is not only a part of business intelligence, but is also used in many other application areas such as research, marketing and financial analytics. For example medical scientists can use patterns extracted from historic patient data in order to determine if a new patient is likely to respond positively to a particular treatment or not; marketing analysts can use extracted patterns from customer data for future advertisement campaigns; finance experts have an interest in patterns that forecast the development of certain stock market shares for investment recommendations. However, extracting knowledge in the form of patterns from massive data volumes imposes a number of computational challenges in terms of processing time, memory, bandwidth and power consumption. These challenges have led to the development of parallel and distributed data analysis approaches and the utilisation of Grid and Cloud computing. This chapter gives an overview of parallel and distributed computing approaches and how they can be used to scale up data mining to large datasets.
Resumo:
Evolutionary meta-algorithms for pulse shaping of broadband femtosecond duration laser pulses are proposed. The genetic algorithm searching the evolutionary landscape for desired pulse shapes consists of a population of waveforms (genes), each made from two concatenated vectors, specifying phases and magnitudes, respectively, over a range of frequencies. Frequency domain operators such as mutation, two-point crossover average crossover, polynomial phase mutation, creep and three-point smoothing as well as a time-domain crossover are combined to produce fitter offsprings at each iteration step. The algorithm applies roulette wheel selection; elitists and linear fitness scaling to the gene population. A differential evolution (DE) operator that provides a source of directed mutation and new wavelet operators are proposed. Using properly tuned parameters for DE, the meta-algorithm is used to solve a waveform matching problem. Tuning allows either a greedy directed search near the best known solution or a robust search across the entire parameter space.
Resumo:
This paper introduces an architecture for identifying and modelling in real-time at a copper mine using new technologies as M2M and cloud computing with a server in the cloud and an Android client inside the mine. The proposed design brings up pervasive mining, a system with wider coverage, higher communication efficiency, better fault-tolerance, and anytime anywhere availability. This solution was designed for a plant inside the mine which cannot tolerate interruption and for which their identification in situ, in real time, is an essential part of the system to control aspects such as instability by adjusting their corresponding parameters without stopping the process.
Resumo:
Global communicationrequirements andloadimbalanceof someparalleldataminingalgorithms arethe major obstacles to exploitthe computational power of large-scale systems. This work investigates how non-uniform data distributions can be exploited to remove the global communication requirement and to reduce the communication costin parallel data mining algorithms and, in particular, in the k-means algorithm for cluster analysis. In the straightforward parallel formulation of the k-means algorithm, data and computation loads are uniformly distributed over the processing nodes. This approach has excellent load balancing characteristics that may suggest it could scale up to large and extreme-scale parallel computing systems. However, at each iteration step the algorithm requires a global reduction operationwhichhinders thescalabilityoftheapproach.Thisworkstudiesadifferentparallelformulation of the algorithm where the requirement of global communication is removed, while maintaining the same deterministic nature ofthe centralised algorithm. The proposed approach exploits a non-uniform data distribution which can be either found in real-world distributed applications or can be induced by means ofmulti-dimensional binary searchtrees. The approachcanalso be extended to accommodate an approximation error which allows a further reduction ofthe communication costs. The effectiveness of the exact and approximate methods has been tested in a parallel computing system with 64 processors and in simulations with 1024 processing element
Resumo:
The third chapter, data mining in education, examines potentials and constraints in the use of data mining in education, summarizing the potential they have to offer meaningful support to: students, teachers, tutors, authors, developers, researchers, and the education and training institutions in which they work and study.
Resumo:
For Northern Hemisphere extra-tropical cyclone activity, the dependency of a potential anthropogenic climate change signal on the identification method applied is analysed. This study investigates the impact of the used algorithm on the changing signal, not the robustness of the climate change signal itself. Using one single transient AOGCM simulation as standard input for eleven state-of-the-art identification methods, the patterns of model simulated present day climatologies are found to be close to those computed from re-analysis, independent of the method applied. Although differences in the total number of cyclones identified exist, the climate change signals (IPCC SRES A1B) in the model run considered are largely similar between methods for all cyclones. Taking into account all tracks, decreasing numbers are found in the Mediterranean, the Arctic in the Barents and Greenland Seas, the mid-latitude Pacific and North America. Changing patterns are even more similar, if only the most severe systems are considered: the methods reveal a coherent statistically significant increase in frequency over the eastern North Atlantic and North Pacific. We found that the differences between the methods considered are largely due to the different role of weaker systems in the specific methods.