67 resultados para Graph mining


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Rapid urbanisation in India has posed serious challenges to the decision makers in regional planning involving plethora of issues including provision of basic amenities (like electricity, water, sanitation, transport, etc.). Urban planning entails an understanding of landscape and urban dynamics with causal factors. Identifying, delineating and mapping landscapes on temporal scale provide an opportunity to monitor the changes, which is important for natural resource management and sustainable planning activities. Multi-source, multi-sensor, multi-temporal, multi-frequency or multi-polarization remote sensing data with efficient classification algorithms and pattern recognition techniques aid in capturing these dynamics. This paper analyses the landscape dynamics of Greater Bangalore by: (i) characterisation of direct impervious surface, (ii) computation of forest fragmentation indices and (iii) modeling to quantify and categorise urban changes. Linear unmixing is used for solving the mixed pixel problem of coarse resolution super spectral MODIS data for impervious surface characterisation. Fragmentation indices were used to classify forests – interior, perforated, edge, transitional, patch and undetermined. Based on this, urban growth model was developed to determine the type of urban growth – Infill, Expansion and Outlying growth. This helped in visualising urban growth poles and consequence of earlier policy decisions that can help in evolving strategies for effective land use policies.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We consider evolving exponential RGGs in one dimension and characterize the time dependent behavior of some of their topological properties. We consider two evolution models and study one of them detail while providing a summary of the results for the other. In the first model, the inter-nodal gaps evolve according to an exponential AR(1) process that makes the stationary distribution of the node locations exponential. For this model we obtain the one-step conditional connectivity probabilities and extend it to the k-step case. Finite and asymptotic analysis are given. We then obtain the k-step connectivity probability conditioned on the network being disconnected. We also derive the pmf of the first passage time for a connected network to become disconnected. We then describe a random birth-death model where at each instant, the node locations evolve according to an AR(1) process. In addition, a random node is allowed to die while giving birth to a node at another location. We derive properties similar to those above.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We consider the problem of computing a minimum cycle basis in a directed graph G. The input to this problem is a directed graph whose arcs have positive weights. In this problem a {- 1, 0, 1} incidence vector is associated with each cycle and the vector space over Q generated by these vectors is the cycle space of G. A set of cycles is called a cycle basis of G if it forms a basis for its cycle space. A cycle basis where the sum of weights of the cycles is minimum is called a minimum cycle basis of G. The current fastest algorithm for computing a minimum cycle basis in a directed graph with m arcs and n vertices runs in O(m(w+1)n) time (where w < 2.376 is the exponent of matrix multiplication). If one allows randomization, then an (O) over tilde (m(3)n) algorithm is known for this problem. In this paper we present a simple (O) over tilde (m(2)n) randomized algorithm for this problem. The problem of computing a minimum cycle basis in an undirected graph has been well-studied. In this problem a {0, 1} incidence vector is associated with each cycle and the vector space over F-2 generated by these vectors is the cycle space of the graph. The fastest known algorithm for computing a minimum cycle basis in an undirected graph runs in O(m(2)n + mn(2) logn) time and our randomized algorithm for directed graphs almost matches this running time.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Mining association rules from a large collection of databases is based on two main tasks. One is generation of large itemsets; and the other is finding associations between the discovered large itemsets. Existing formalism for association rules are based on a single transaction database which is not sufficient to describe the association rules based on multiple database environment. In this paper, we give a general characterization of association rules and also give a framework for knowledge-based mining of multiple databases for association rules.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Data mining is concerned with analysing large volumes of (often unstructured) data to automatically discover interesting regularities or relationships which in turn lead to better understanding of the underlying processes. The field of temporal data mining is concerned with such analysis in the case of ordered data streams with temporal interdependencies. Over the last decade many interesting techniques of temporal data mining were proposed and shown to be useful in many applications. Since temporal data mining brings together techniques from different fields such as statistics, machine learning and databases, the literature is scattered among many different sources. In this article, we present an overview of techniques of temporal data mining.We mainly concentrate on algorithms for pattern discovery in sequential data streams.We also describe some recent results regarding statistical analysis of pattern discovery methods.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A method, system, and computer program product for fault data correlation in a diagnostic system are provided. The method includes receiving the fault data including a plurality of faults collected over a period of time, and identifying a plurality of episodes within the fault data, where each episode includes a sequence of the faults. The method further includes calculating a frequency of the episodes within the fault data, calculating a correlation confidence of the faults relative to the episodes as a function of the frequency of the episodes, and outputting a report of the faults with the correlation confidence.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A system for temporal data mining includes a computer readable medium having an application configured to receive at an input module a temporal data series having events with start times and end times, a set of allowed dwelling times and a threshold frequency. The system is further configured to identify, using a candidate identification and tracking module, one or more occurrences in the temporal data series of a candidate episode and increment a count for each identified occurrence. The system is also configured to produce at an output module an output for those episodes whose count of occurrences results in a frequency exceeding the threshold frequency.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Researchers can use bond graph modeling, a tool that takes into account the energy conservation principle, to accurately assess the dynamic behavior of wireless sensor networks on a continuous basis.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Wireless sensor networks can often be viewed in terms of a uniform deployment of a large number of nodes in a region of Euclidean space. Following deployment, the nodes self-organize into a mesh topology with a key aspect being self-localization. Having obtained a mesh topology in a dense, homogeneous deployment, a frequently used approximation is to take the hop distance between nodes to be proportional to the Euclidean distance between them. In this work, we analyze this approximation through two complementary analyses. We assume that the mesh topology is a random geometric graph on the nodes; and that some nodes are designated as anchors with known locations. First, we obtain high probability bounds on the Euclidean distances of all nodes that are h hops away from a fixed anchor node. In the second analysis, we provide a heuristic argument that leads to a direct approximation for the density function of the Euclidean distance between two nodes that are separated by a hop distance h. This approximation is shown, through simulation, to very closely match the true density function. Localization algorithms that draw upon the preceding analyses are then proposed and shown to perform better than some of the well-known algorithms present in the literature. Belief-propagation-based message-passing is then used to further enhance the performance of the proposed localization algorithms. To our knowledge, this is the first usage of message-passing for hop-count-based self-localization.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper, we employ message passing algorithms over graphical models to jointly detect and decode symbols transmitted over large multiple-input multiple-output (MIMO) channels with low density parity check (LDPC) coded bits. We adopt a factor graph based technique to integrate the detection and decoding operations. A Gaussian approximation of spatial interference is used for detection. This serves as a low complexity joint detection/decoding approach for large dimensional MIMO systems coded with LDPC codes of large block lengths. This joint processing achieves significantly better performance than the individual detection and decoding scheme.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Genome sequences contain a number of patterns that have biomedical significance. Repetitive sequences of various kinds are a primary component of most of the genomic sequence patterns. We extended the suffix-array based Biological Language Modeling Toolkit to compute n-gram frequencies as well as n-gram language-model based perplexity in windows over the whole genome sequence to find biologically relevant patterns. We present the suite of tools and their application for analysis on whole human genome sequence.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Points-to analysis is a key compiler analysis. Several memory related optimizations use points-to information to improve their effectiveness. Points-to analysis is performed by building a constraint graph of pointer variables and dynamically updating it to propagate more and more points-to information across its subset edges. So far, the structure of the constraint graph has been only trivially exploited for efficient propagation of information, e.g., in identifying cyclic components or to propagate information in topological order. We perform a careful study of its structure and propose a new inclusion-based flow-insensitive context-sensitive points-to analysis algorithm based on the notion of dominant pointers. We also propose a new kind of pointer-equivalence based on dominant pointers which provides significantly more opportunities for reducing the number of pointers tracked during the analysis. Based on this hitherto unexplored form of pointer-equivalence, we develop a new context-sensitive flow-insensitive points-to analysis algorithm which uses incremental dominator update to efficiently compute points-to information. Using a large suite of programs consisting of SPEC 2000 benchmarks and five large open source programs we show that our points-to analysis is 88% faster than BDD-based Lazy Cycle Detection and 2x faster than Deep Propagation. We argue that our approach of detecting dominator-based pointer-equivalence is a key to improve points-to analysis efficiency.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Song-selection and mood are interdependent. If we capture a song’s sentiment, we can determine the mood of the listener, which can serve as a basis for recommendation systems. Songs are generally classified according to genres, which don’t entirely reflect sentiments. Thus, we require an unsupervised scheme to mine them. Sentiments are classified into either two (positive/negative) or multiple (happy/angry/sad/...) classes, depending on the application. We are interested in analyzing the feelings invoked by a song, involving multi-class sentiments. To mine the hidden sentimental structure behind a song, in terms of “topics”, we consider its lyrics and use Latent Dirichlet Allocation (LDA). Each song is a mixture of moods. Topics mined by LDA can represent moods. Thus we get a scheme of collecting similar-mood songs. For validation, we use a dataset of songs containing 6 moods annotated by users of a particular website.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In many real world prediction problems the output is a structured object like a sequence or a tree or a graph. Such problems range from natural language processing to compu- tational biology or computer vision and have been tackled using algorithms, referred to as structured output learning algorithms. We consider the problem of structured classifi- cation. In the last few years, large margin classifiers like sup-port vector machines (SVMs) have shown much promise for structured output learning. The related optimization prob -lem is a convex quadratic program (QP) with a large num-ber of constraints, which makes the problem intractable for large data sets. This paper proposes a fast sequential dual method (SDM) for structural SVMs. The method makes re-peated passes over the training set and optimizes the dual variables associated with one example at a time. The use of additional heuristics makes the proposed method more efficient. We present an extensive empirical evaluation of the proposed method on several sequence learning problems.Our experiments on large data sets demonstrate that the proposed method is an order of magnitude faster than state of the art methods like cutting-plane method and stochastic gradient descent method (SGD). Further, SDM reaches steady state generalization performance faster than the SGD method. The proposed SDM is thus a useful alternative for large scale structured output learning.