35 resultados para spatial temporal data mining
Resumo:
A system for temporal data mining includes a computer readable medium having an application configured to receive at an input module a temporal data series and a threshold frequency. The system is further configured to identify, using a candidate identification and tracking module, one or more occurrences in the temporal data series of a candidate episode and increment a count for each identified occurrence. The system is also configured to produce at an output module an output for those episodes whose count of occurrences results in a frequency exceeding the threshold frequency.
Resumo:
A novel algorithm for Virtual View Synthesis based on Non-Local Means Filtering is presented in this paper. Apart from using the video frames from the nearby cameras and the corresponding per-pixel depth map, this algorithm also makes use of the previously synthesized frame. Simple and efficient, the algorithm can synthesize video at any given virtual viewpoint at a faster rate. In the process, the quality of the synthesized frame is not compromised. Experimental results prove the above mentioned claim. The subjective and objective quality of the synthesized frames are comparable to the existing algorithms.
Resumo:
The disclosure of information and its misuse in Privacy Preserving Data Mining (PPDM) systems is a concern to the parties involved. In PPDM systems data is available amongst multiple parties collaborating to achieve cumulative mining accuracy. The vertically partitioned data available with the parties involved cannot provide accurate mining results when compared to the collaborative mining results. To overcome the privacy issue in data disclosure this paper describes a Key Distribution-Less Privacy Preserving Data Mining (KDLPPDM) system in which the publication of local association rules generated by the parties is published. The association rules are securely combined to form the combined rule set using the Commutative RSA algorithm. The combined rule sets established are used to classify or mine the data. The results discussed in this paper compare the accuracy of the rules generated using the C4. 5 based KDLPPDM system and the CS. 0 based KDLPPDM system using receiver operating characteristics curves (ROC).
Resumo:
Online Social Networks (OSNs) facilitate to create and spread information easily and rapidly, influencing others to participate and propagandize. This work proposes a novel method of profiling Influential Blogger (IB) based on the activities performed on one's blog documents who influences various other bloggers in Social Blog Network (SBN). After constructing a social blogging site, a SBN is analyzed with appropriate parameters to get the Influential Blog Power (IBP) of each blogger in the network and demonstrate that profiling IB is adequate and accurate. The proposed Profiling Influential Blogger (PIB) Algorithm survival rate of IB is high and stable. (C) 2015 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
Resumo:
Frequent episode discovery framework is a popular framework in temporal data mining with many applications. Over the years, many different notions of frequencies of episodes have been proposed along with different algorithms for episode discovery. In this paper, we present a unified view of all the apriori-based discoverymethods for serial episodes under these different notions of frequencies. Specifically, we present a unified view of the various frequency counting algorithms. We propose a generic counting algorithm such that all current algorithms are special cases of it. This unified view allows one to gain insights into different frequencies, and we present quantitative relationships among different frequencies.Our unified view also helps in obtaining correctness proofs for various counting algorithms as we show here. It also aids in understanding and obtaining the anti-monotonicity properties satisfied by the various frequencies, the properties exploited by the candidate generation step of any apriori-based method. We also point out how our unified view of counting helps to consider generalization of the algorithm to count episodes with general partial orders.
Resumo:
Discovering patterns in temporal data is an important task in Data Mining. A successful method for this was proposed by Mannila et al. [1] in 1997. In their framework, mining for temporal patterns in a database of sequences of events is done by discovering the so called frequent episodes. These episodes characterize interesting collections of events occurring relatively close to each other in some partial order. However, in this framework(and in many others for finding patterns in event sequences), the ordering of events in an event sequence is the only allowed temporal information. But there are many applications where the events are not instantaneous; they have time durations. Interesting episodesthat we want to discover may need to contain information regarding event durations etc. In this paper we extend Mannila et al.’s framework to tackle such issues. In our generalized formulation, episodes are defined so that much more temporal information about events can be incorporated into the structure of an episode. This significantly enhances the expressive capability of the rules that can be discovered in the frequent episode framework. We also present algorithms for discovering such generalized frequent episodes.
Resumo:
Frequent episode discovery framework is a popular framework in temporal data mining with many applications. Over the years, many different notions of frequencies of episodes have been proposed along with different algorithms for episode discovery. In this paper, we present a unified view of all the apriori-based discovery methods for serial episodes under these different notions of frequencies. Specifically, we present a unified view of the various frequency counting algorithms. We propose a generic counting algorithm such that all current algorithms are special cases of it. This unified view allows one to gain insights into different frequencies, and we present quantitative relationships among different frequencies. Our unified view also helps in obtaining correctness proofs for various counting algorithms as we show here. It also aids in understanding and obtaining the anti-monotonicity properties satisfied by the various frequencies, the properties exploited by the candidate generation step of any apriori-based method. We also point out how our unified view of counting helps to consider generalization of the algorithm to count episodes with general partial orders.
Resumo:
Land cover (LC) changes play a major role in global as well as at regional scale patterns of the climate and biogeochemistry of the Earth system. LC information presents critical insights in understanding of Earth surface phenomena, particularly useful when obtained synoptically from remote sensing data. However, for developing countries and those with large geographical extent, regular LC mapping is prohibitive with data from commercial sensors (high cost factor) of limited spatial coverage (low temporal resolution and band swath). In this context, free MODIS data with good spectro-temporal resolution meet the purpose. LC mapping from these data has continuously evolved with advances in classification algorithms. This paper presents a comparative study of two robust data mining techniques, the multilayer perceptron (MLP) and decision tree (DT) on different products of MODIS data corresponding to Kolar district, Karnataka, India. The MODIS classified images when compared at three different spatial scales (at district level, taluk level and pixel level) shows that MLP based classification on minimum noise fraction components on MODIS 36 bands provide the most accurate LC mapping with 86% accuracy, while DT on MODIS 36 bands principal components leads to less accurate classification (69%).
Resumo:
With the emergence of large-volume and high-speed streaming data, the recent techniques for stream mining of CFIpsilas (closed frequent itemsets) will become inefficient. When concept drift occurs at a slow rate in high speed data streams, the rate of change of information across different sliding windows will be negligible. So, the user wonpsilat be devoid of change in information if we slide window by multiple transactions at a time. Therefore, we propose a novel approach for mining CFIpsilas cumulatively by making sliding width(ges1) over high speed data streams. However, it is nontrivial to mine CFIpsilas cumulatively over stream, because such growth may lead to the generation of exponential number of candidates for closure checking. In this study, we develop an efficient algorithm, stream-close, for mining CFIpsilas over stream by exploring some interesting properties. Our performance study reveals that stream-close achieves good scalability and has promising results.
Resumo:
Urbanisation is a dynamic complex phenomenon involving large scale changes in the land uses at local levels. Analyses of changes in land uses in urban environments provide a historical perspective of land use and give an opportunity to assess the spatial patterns, correlation, trends, rate and impacts of the change, which would help in better regional planning and good governance of the region. Main objective of this research is to quantify the urban dynamics using temporal remote sensing data with the help of well-established landscape metrics. Bangalore being one of the rapidly urbanising landscapes in India has been chosen for this investigation. Complex process of urban sprawl was modelled using spatio temporal analysis. Land use analyses show 584% growth in built-up area during the last four decades with the decline of vegetation by 66% and water bodies by 74%. Analyses of the temporal data reveals an increase in urban built up area of 342.83% (during 1973-1992), 129.56% (during 1992-1999), 106.7% (1999-2002), 114.51% (2002-2006) and 126.19% from 2006 to 2010. The Study area was divided into four zones and each zone is further divided into 17 concentric circles of 1 km incrementing radius to understand the patterns and extent of the urbanisation at local levels. The urban density gradient illustrates radial pattern of urbanisation for the period 1973-2010. Bangalore grew radially from 1973 to 2010 indicating that the urbanisation is intensifying from the central core and has reached the periphery of the Greater Bangalore. Shannon's entropy, alpha and beta population densities were computed to understand the level of urbanisation at local levels. Shannon's entropy values of recent time confirms dispersed haphazard urban growth in the city, particularly in the outskirts of the city. This also illustrates the extent of influence of drivers of urbanisation in various directions. Landscape metrics provided in depth knowledge about the sprawl. Principal component analysis helped in prioritizing the metrics for detailed analyses. The results clearly indicates that whole landscape is aggregating to a large patch in 2010 as compared to earlier years which was dominated by several small patches. The large scale conversion of small patches to large single patch can be seen from 2006 to 2010. In the year 2010 patches are maximally aggregated indicating that the city is becoming more compact and more urbanised in recent years. Bangalore was the most sought after destination for its climatic condition and the availability of various facilities (land availability, economy, political factors) compared to other cities. The growth into a single urban patch can be attributed to rapid urbanisation coupled with the industrialisation. Monitoring of growth through landscape metrics helps to maintain and manage the natural resources. (C) 2012 Elsevier B.V. All rights reserved.
Resumo:
We address the problem of mining targeted association rules over multidimensional market-basket data. Here, each transaction has, in addition to the set of purchased items, ancillary dimension attributes associated with it. Based on these dimensions, transactions can be visualized as distributed over cells of an n-dimensional cube. In this framework, a targeted association rule is of the form {X -> Y} R, where R is a convex region in the cube and X. Y is a traditional association rule within region R. We first describe the TOARM algorithm, based on classical techniques, for identifying targeted association rules. Then, we discuss the concepts of bottom-up aggregation and cubing, leading to the CellUnion technique. This approach is further extended, using notions of cube-count interleaving and credit-based pruning, to derive the IceCube algorithm. Our experiments demonstrate that IceCube consistently provides the best execution time performance, especially for large and complex data cubes.
Resumo:
The rapid growth in the field of data mining has lead to the development of various methods for outlier detection. Though detection of outliers has been well explored in the context of numerical data, dealing with categorical data is still evolving. In this paper, we propose a two-phase algorithm for detecting outliers in categorical data based on a novel definition of outliers. In the first phase, this algorithm explores a clustering of the given data, followed by the ranking phase for determining the set of most likely outliers. The proposed algorithm is expected to perform better as it can identify different types of outliers, employing two independent ranking schemes based on the attribute value frequencies and the inherent clustering structure in the given data. Unlike some existing methods, the computational complexity of this algorithm is not affected by the number of outliers to be detected. The efficacy of this algorithm is demonstrated through experiments on various public domain categorical data sets.
Resumo:
Image fusion techniques are useful to integrate the geometric detail of a high-resolution panchromatic (PAN) image and the spectral information of a low-resolution multispectral (MSS) image, particularly important for understanding land use dynamics at larger scale (1:25000 or lower), which is required by the decision makers to adopt holistic approaches for regional planning. Fused images can extract features from source images and provide more information than one scene of MSS image. High spectral resolution aids in identification of objects more distinctly while high spatial resolution allows locating the objects more clearly. The geoinformatics technologies with an ability to provide high-spatial-spectral-resolution data helps in inventorying, mapping, monitoring and sustainable management of natural resources. Fusion module in GRDSS, taking into consideration the limitations in spatial resolution of MSS data and spectral resolution of PAN data, provide high-spatial-spectral-resolution remote sensing images required for land use mapping on regional scale. GRDSS is a freeware GIS Graphic User Interface (GUI) developed in Tcl/Tk is based on command line arguments of GRASS (Geographic Resources Analysis Support System) with the functionalities for raster analysis, vector analysis, site analysis, image processing, modeling and graphics visualization. It has the capabilities to capture, store, process, analyse, prioritize and display spatial and temporal data.
Resumo:
Wetlands are the most productive and biologically diverse but very fragile ecosystems. They are vulnerable to even small changes in their biotic and abiotic factors. In recent years, there has been concern over the continuous degradation of wetlands due to unplanned developmental activities. This necessitates inventorying, mapping, and monitoring of wetlands to implement sustainable management approaches. The principal objective of this work is to evolve a strategy to identify and monitor wetlands using temporal remote sensing (RS) data. Pattern classifiers were used to extract wetlands automatically from NIR bands of MODIS, Landsat MSS and Landsat TM remote sensing data. MODIS provided data for 2002 to 2007, while for 1973 and 1992 IR Bands of Landsat MSS and TM (79m and 30m spatial resolution) data were used. Principal components of IR bands of MODIS (250 m) were fused with IRS LISS-3 NIR (23.5 m). To extract wetlands, statistical unsupervised learning of IR bands for the respective temporal data was performed using Bayesian approach based on prior probability, mean and covariance. Temporal analysis of wetlands indicates a sharp decline of 58% in Greater Bangalore attributing to intense urbanization processes, evident from a 466% increase in built-up area from 1973 to 2007.
Resumo:
Land cover (LC) and land use (LU) dynamics induced by human and natural processes play a major role in global as well as regional patterns of landscapes influencing biodiversity, hydrology, ecology and climate. Changes in LC features resulting in forest fragmentations have posed direct threats to biodiversity, endangering the sustainability of ecological goods and services. Habitat fragmentation is of added concern as the residual spatial patterns mitigate or exacerbate edge effects. LU dynamics are obtained by classifying temporal remotely sensed satellite imagery of different spatial and spectral resolutions. This paper reviews five different image classification algorithms using spatio-temporal data of a temperate watershed in Himachal Pradesh, India. Gaussian Maximum Likelihood classifier was found to be apt for analysing spatial pattern at regional scale based on accuracy assessment through error matrix and ROC (receiver operating characteristic) curves. The LU information thus derived was then used to assess spatial changes from temporal data using principal component analysis and correspondence analysis based image differencing. The forest area dynamics was further studied by analysing the different types of fragmentation through forest fragmentation models. The computed forest fragmentation and landscape metrics show a decline of interior intact forests with a substantial increase in patch forest during 1972-2007.