165 resultados para stream mining
em CentAUR: Central Archive University of Reading - UK
Resumo:
In this article, we review the state-of-the-art techniques in mining data streams for mobile and ubiquitous environments. We start the review with a concise background of data stream processing, presenting the building blocks for mining data streams. In a wide range of applications, data streams are required to be processed on small ubiquitous devices like smartphones and sensor devices. Mobile and ubiquitous data mining target these applications with tailored techniques and approaches addressing scarcity of resources and mobility issues. Two categories can be identified for mobile and ubiquitous mining of streaming data: single-node and distributed. This survey will cover both categories. Mining mobile and ubiquitous data require algorithms with the ability to monitor and adapt the working conditions to the available computational resources. We identify the key characteristics of these algorithms and present illustrative applications. Distributed data stream mining in the mobile environment is then discussed, presenting the Pocket Data Mining framework. Mobility of users stimulates the adoption of context-awareness in this area of research. Context-awareness and collaboration are discussed in the Collaborative Data Stream Mining, where agents share knowledge to learn adaptive accurate models.
Resumo:
Pocket Data Mining (PDM) is our new term describing collaborative mining of streaming data in mobile and distributed computing environments. With sheer amounts of data streams are now available for subscription on our smart mobile phones, the potential of using this data for decision making using data stream mining techniques has now been achievable owing to the increasing power of these handheld devices. Wireless communication among these devices using Bluetooth and WiFi technologies has opened the door wide for collaborative mining among the mobile devices within the same range that are running data mining techniques targeting the same application. This paper proposes a new architecture that we have prototyped for realizing the significant applications in this area. We have proposed using mobile software agents in this application for several reasons. Most importantly the autonomic intelligent behaviour of the agent technology has been the driving force for using it in this application. Other efficiency reasons are discussed in details in this paper. Experimental results showing the feasibility of the proposed architecture are presented and discussed.
Resumo:
Distributed and collaborative data stream mining in a mobile computing environment is referred to as Pocket Data Mining PDM. Large amounts of available data streams to which smart phones can subscribe to or sense, coupled with the increasing computational power of handheld devices motivates the development of PDM as a decision making system. This emerging area of study has shown to be feasible in an earlier study using technological enablers of mobile software agents and stream mining techniques [1]. A typical PDM process would start by having mobile agents roam the network to discover relevant data streams and resources. Then other (mobile) agents encapsulating stream mining techniques visit the relevant nodes in the network in order to build evolving data mining models. Finally, a third type of mobile agents roam the network consulting the mining agents for a final collaborative decision, when required by one or more users. In this paper, we propose the use of distributed Hoeffding trees and Naive Bayes classifers in the PDM framework over vertically partitioned data streams. Mobile policing, health monitoring and stock market analysis are among the possible applications of PDM. An extensive experimental study is reported showing the effectiveness of the collaborative data mining with the two classifers.
Resumo:
Advances in hardware technologies allow to capture and process data in real-time and the resulting high throughput data streams require novel data mining approaches. The research area of Data Stream Mining (DSM) is developing data mining algorithms that allow us to analyse these continuous streams of data in real-time. The creation and real-time adaption of classification models from data streams is one of the most challenging DSM tasks. Current classifiers for streaming data address this problem by using incremental learning algorithms. However, even so these algorithms are fast, they are challenged by high velocity data streams, where data instances are incoming at a fast rate. This is problematic if the applications desire that there is no or only a very little delay between changes in the patterns of the stream and absorption of these patterns by the classifier. Problems of scalability to Big Data of traditional data mining algorithms for static (non streaming) datasets have been addressed through the development of parallel classifiers. However, there is very little work on the parallelisation of data stream classification techniques. In this paper we investigate K-Nearest Neighbours (KNN) as the basis for a real-time adaptive and parallel methodology for scalable data stream classification tasks.
Resumo:
Advances in hardware and software in the past decade allow to capture, record and process fast data streams at a large scale. The research area of data stream mining has emerged as a consequence from these advances in order to cope with the real time analysis of potentially large and changing data streams. Examples of data streams include Google searches, credit card transactions, telemetric data and data of continuous chemical production processes. In some cases the data can be processed in batches by traditional data mining approaches. However, in some applications it is required to analyse the data in real time as soon as it is being captured. Such cases are for example if the data stream is infinite, fast changing, or simply too large in size to be stored. One of the most important data mining techniques on data streams is classification. This involves training the classifier on the data stream in real time and adapting it to concept drifts. Most data stream classifiers are based on decision trees. However, it is well known in the data mining community that there is no single optimal algorithm. An algorithm may work well on one or several datasets but badly on others. This paper introduces eRules, a new rule based adaptive classifier for data streams, based on an evolving set of Rules. eRules induces a set of rules that is constantly evaluated and adapted to changes in the data stream by adding new and removing old rules. It is different from the more popular decision tree based classifiers as it tends to leave data instances rather unclassified than forcing a classification that could be wrong. The ongoing development of eRules aims to improve its accuracy further through dynamic parameter setting which will also address the problem of changing feature domain values.
Resumo:
Advances in hardware and software technologies allow to capture streaming data. The area of Data Stream Mining (DSM) is concerned with the analysis of these vast amounts of data as it is generated in real-time. Data stream classification is one of the most important DSM techniques allowing to classify previously unseen data instances. Different to traditional classifiers for static data, data stream classifiers need to adapt to concept changes (concept drift) in the stream in real-time in order to reflect the most recent concept in the data as accurately as possible. A recent addition to the data stream classifier toolbox is eRules which induces and updates a set of expressive rules that can easily be interpreted by humans. However, like most rule-based data stream classifiers, eRules exhibits a poor computational performance when confronted with continuous attributes. In this work, we propose an approach to deal with continuous data effectively and accurately in rule-based classifiers by using the Gaussian distribution as heuristic for building rule terms on continuous attributes. We show on the example of eRules that incorporating our method for continuous attributes indeed speeds up the real-time rule induction process while maintaining a similar level of accuracy compared with the original eRules classifier. We termed this new version of eRules with our approach G-eRules.
Resumo:
Collaborative mining of distributed data streams in a mobile computing environment is referred to as Pocket Data Mining PDM. Hoeffding trees techniques have been experimentally and analytically validated for data stream classification. In this paper, we have proposed, developed and evaluated the adoption of distributed Hoeffding trees for classifying streaming data in PDM applications. We have identified a realistic scenario in which different users equipped with smart mobile devices run a local Hoeffding tree classifier on a subset of the attributes. Thus, we have investigated the mining of vertically partitioned datasets with possible overlap of attributes, which is the more likely case. Our experimental results have validated the efficiency of our proposed model achieving promising accuracy for real deployment.
Resumo:
Pocket Data Mining (PDM) describes the full process of analysing data streams in mobile ad hoc distributed environments. Advances in mobile devices like smart phones and tablet computers have made it possible for a wide range of applications to run in such an environment. In this paper, we propose the adoption of data stream classification techniques for PDM. Evident by a thorough experimental study, it has been proved that running heterogeneous/different, or homogeneous/similar data stream classification techniques over vertically partitioned data (data partitioned according to the feature space) results in comparable performance to batch and centralised learning techniques.
Resumo:
The spatial and temporal dynamics in the stream water NO3-N concentrations in a major European river-system, the Garonne (62,700 km(2)), are described and related to variations in climate, land management, and effluent point-sources using multivariate statistics. Building on this, the Hydrologiska Byrans Vattenbalansavdelning (HBV) rainfall-runoff model and the Integrated Catchment Model of Nitrogen (INCA-N) are applied to simulate the observed flow and N dynamics. This is done to help us to understand which factors and processes control the flow and N dynamics in different climate zones and to assess the relative inputs from diffuse and point sources across the catchment. This is the first application of the linked HBV and INCA-N models to a major European river system commensurate with the largest basins to be managed tinder the Water Framework Directive. The simulations suggest that in the lowlands, seasonal patterns in the stream water NO3-N concentrations emerge and are dominated by diffuse agricultural inputs, with an estimated 75% of the river load in the lowlands derived from arable farming. The results confirm earlier European catchment studies. Namely, current semi-distrubuted catchment-scale dynamic models, which integrate variations in land cover, climate, and a simple representation of the terrestrial and in-stream N cycle, are able to simulate seasonal NO3-N patterns at large spatial (> 300 km(2)) and temporal (>= monthly) scales using available national datasets.
Resumo:
Trace elements may present an environmental hazard in the vicinity of mining and smelting activities. However, the factors controlling their distribution and transfer within the soil and vegetation systems are not always well defined. Total concentrations of up to 15,195 mg center dot kg (-1) As, 6,690 mg center dot kg(-1) Cu, 24,820 mg center dot kg(-1) Pb and 9,810 mg center dot kg(-1) Zn in soils, and 62 mg center dot kg(-1) As, 1,765 mg center dot kg(-1) Cu, 280 mg center dot kg(-1) Pb and 3,460 mg center dot kg (-1) Zn in vegetation were measured. However, unusually for smelters and mines of a similar size, the elevated trace element concentrations in soils were found to be restricted to the immediate vicinity of the mines and smelters (maximum 2-3 km). Parent material, prevailing wind direction, and soil physical and chemical characteristics were found to correlate poorly with the restricted trace element distributions in soils. Hypotheses are given for this unusual distribution: (1) the contaminated soils were removed by erosion or (2) mines and smelters released large heavy particles that could not have been transported long distances. Analyses of the accumulation of trace elements in vegetation (median ratios: As 0.06, Cu 0.19, Pb 0.54 and Zn 1.07) and the percentage of total trace elements being DTPA extractable in soils (median percentages: As 0.06%, Cu 15%, Pb 7% and Zn 4%) indicated higher relative trace element mobility in soils with low total concentrations than in soils with elevated concentrations.
Resumo:
Trace elements may present an environmental hazard in the vicinity of mining and smelting activities. However, the factors controlling trace element distribution in soils around ancient and modem mining and smelting areas are not always clear. Tharsis, Riotinto and Huelva are located in the Iberian Pyrite Belt in SW Spain. Tharsis and Riotinto mines have been exploited since 2500 B.C., with intensive smelting taking place. Huelva, established in 1970 and using the Flash Furnace Outokumpu process, is currently one of the largest smelter in the world. Pyrite and chalcopyrite ore have been intensively smelted for Cu. However, unusually for smelters and mines of a similar size, the elevated trace element concentrations in soils were found to be restricted to the immediate vicinity of the mines and smelters, being found up to a maximum of 2 kin from the mines and smelters at Tharsis, Riotinto and Huelva. Trace element partitioning (over 2/3 of trace elements found in the residual immobile fraction of soils at Tharsis) and soil particles examination by SEM-EDX showed that trace elements were not adsorbed onto soil particles, but were included within the matrix of large trace element-rich Fe silicate slag particles (i.e. 1 min circle divide at least 1 wt.% As, Cu and Zn, and 2 wt.% Pb). Slag particle large size (I mm 0) was found to control the geographically restricted trace element distribution in soils at Tharsis, Riotinto and Huelva, since large heavy particles could not have been transported long distances. Distribution and partitioning indicated that impacts to the environment as a result of mining and smelting should remain minimal in the region. (c) 2006 Elsevier B.V. All rights reserved.
Resumo:
Toxic trace elements present an environmental hazard in the vicinity of mining and smelting activities. However. the processes of transfer of these elements to groundwater and to plants are not always clear. Tharsis mine. in the Iberian pyrite belt (SW Spain), has been exploited since 2500 BC, with extensive smelting, taking place front the 1850S until the 1920s. Sixty four soil (mainly topsoils) and vegetation samples were collected in February 2001 and analysed by ICP-AES for 23 elements. Concentrations are 6-6300 mg kg(-1) As and 14-24800 mg kg(-1) Pb in soils, and 0.20-9 mg kg(-1) As and 2-195 mg Pb in vegetation. Trace element concentrations decrease rapidly away from the mine. with As and Pb concentrations in the range 6-1850 mg kg(-1) (median 22 mg kg(-1)) and 14-31 mg, kg(-1) (median 43 mg, kg(-1)), respectively, 1 km away from the mine. These concentrations are low when compared to other well-studied mining and smelting areas (e.g. 600 mg kg(-1) As at 8 km from Yellowknife smelter, Canada; >100 mg kg(-1) Pb over 270 km(2) around the Pb-Zn Port Pirie smelter. South Australia: mean of 1419 mg kg(-1) Pb around Aberystwyth smelter, Wales, UK). The high metal content of the vegetation and the low soil pH (mean pH 4.93) indicate the potential for trace element mobility which Could explain the relatively low concentration of metals in Tharsis topsoils and cause threats to plans to redevelop the Tharsis area as an orange plantation.