851 resultados para Graph mining
Resumo:
In this article we study the one-dimensional random geometric (random interval) graph when the location of the nodes are independent and exponentially distributed. We derive exact results and limit theorems for the connectivity and other properties associated with this random graph. We show that the asymptotic properties of a graph with a truncated exponential distribution can be obtained using the exponential random geometric graph. © 2007 Wiley Periodicals, Inc. Random Struct. Alg., 2008.
Resumo:
We consider a multicommodity flow problem on a complete graph whose edges have random, independent, and identically distributed capacities. We show that, as the number of nodes tends to infinity, the maximumutility, given by the average of a concave function of each commodity How, has an almost-sure limit. Furthermore, the asymptotically optimal flow uses only direct and two-hop paths, and can be obtained in a distributed manner.
Resumo:
Classification of large datasets is a challenging task in Data Mining. In the current work, we propose a novel method that compresses the data and classifies the test data directly in its compressed form. The work forms a hybrid learning approach integrating the activities of data abstraction, frequent item generation, compression, classification and use of rough sets.
Resumo:
Classification of large datasets is a challenging task in Data Mining. In the current work, we propose a novel method that compresses the data and classifies the test data directly in its compressed form. The work forms a hybrid learning approach integrating the activities of data abstraction, frequent item generation, compression, classification and use of rough sets.
Resumo:
Automatic identification of software faults has enormous practical significance. This requires characterizing program execution behavior and the use of appropriate data mining techniques on the chosen representation. In this paper, we use the sequence of system calls to characterize program execution. The data mining tasks addressed are learning to map system call streams to fault labels and automatic identification of fault causes. Spectrum kernels and SVM are used for the former while latent semantic analysis is used for the latter The techniques are demonstrated for the intrusion dataset containing system call traces. The results show that kernel techniques are as accurate as the best available results but are faster by orders of magnitude. We also show that latent semantic indexing is capable of revealing fault-specific features.
Resumo:
A distributed system is a collection of networked autonomous processing units which must work in a cooperative manner. Currently, large-scale distributed systems, such as various telecommunication and computer networks, are abundant and used in a multitude of tasks. The field of distributed computing studies what can be computed efficiently in such systems. Distributed systems are usually modelled as graphs where nodes represent the processors and edges denote communication links between processors. This thesis concentrates on the computational complexity of the distributed graph colouring problem. The objective of the graph colouring problem is to assign a colour to each node in such a way that no two nodes connected by an edge share the same colour. In particular, it is often desirable to use only a small number of colours. This task is a fundamental symmetry-breaking primitive in various distributed algorithms. A graph that has been coloured in this manner using at most k different colours is said to be k-coloured. This work examines the synchronous message-passing model of distributed computation: every node runs the same algorithm, and the system operates in discrete synchronous communication rounds. During each round, a node can communicate with its neighbours and perform local computation. In this model, the time complexity of a problem is the number of synchronous communication rounds required to solve the problem. It is known that 3-colouring any k-coloured directed cycle requires at least ½(log* k - 3) communication rounds and is possible in ½(log* k + 7) communication rounds for all k ≥ 3. This work shows that for any k ≥ 3, colouring a k-coloured directed cycle with at most three colours is possible in ½(log* k + 3) rounds. In contrast, it is also shown that for some values of k, colouring a directed cycle with at most three colours requires at least ½(log* k + 1) communication rounds. Furthermore, in the case of directed rooted trees, reducing a k-colouring into a 3-colouring requires at least log* k + 1 rounds for some k and possible in log* k + 3 rounds for all k ≥ 3. The new positive and negative results are derived using computational methods, as the existence of distributed colouring algorithms corresponds to the colourability of so-called neighbourhood graphs. The colourability of these graphs is analysed using Boolean satisfiability (SAT) solvers. Finally, this thesis shows that similar methods are applicable in capturing the existence of distributed algorithms for other graph problems, such as the maximal matching problem.
Resumo:
A k-dimensional box is the Cartesian product R-1 X R-2 X ... X R-k where each R-i is a closed interval on the real line. The boxicity of a graph G, denoted as box(G), is the minimum integer k such that G can be represented as the intersection graph of a collection of k-dimensional boxes. A unit cube in k-dimensional space or a k-cube is defined as the Cartesian product R-1 X R-2 X ... X R-k where each R-i is a closed interval oil the real line of the form a(i), a(i) + 1]. The cubicity of G, denoted as cub(G), is the minimum integer k such that G can be represented as the intersection graph of a collection of k-cubes. The threshold dimension of a graph G(V, E) is the smallest integer k such that E can be covered by k threshold spanning subgraphs of G. In this paper we will show that there exists no polynomial-time algorithm for approximating the threshold dimension of a graph on n vertices with a factor of O(n(0.5-epsilon)) for any epsilon > 0 unless NP = ZPP. From this result we will show that there exists no polynomial-time algorithm for approximating the boxicity and the cubicity of a graph on n vertices with factor O(n(0.5-epsilon)) for any epsilon > 0 unless NP = ZPP. In fact all these hardness results hold even for a highly structured class of graphs, namely the split graphs. We will also show that it is NP-complete to determine whether a given split graph has boxicity at most 3. (C) 2010 Elsevier B.V. All rights reserved.
Resumo:
Modelling of city traffic involves capturing of all the dynamics that exist in real-time traffic. Probabilistic models and queuing theory have been used for mathematical representation of the traffic system. This paper proposes the concept of modelling the traffic system using bond graphs wherein traffic flow is based on energy conservation. The proposed modelling approach uses switched junctions to model complex traffic networks. This paper presents the modelling, simulation and experimental validation aspects.
Resumo:
We propose to compress weighted graphs (networks), motivated by the observation that large networks of social, biological, or other relations can be complex to handle and visualize. In the process also known as graph simplication, nodes and (unweighted) edges are grouped to supernodes and superedges, respectively, to obtain a smaller graph. We propose models and algorithms for weighted graphs. The interpretation (i.e. decompression) of a compressed, weighted graph is that a pair of original nodes is connected by an edge if their supernodes are connected by one, and that the weight of an edge is approximated to be the weight of the superedge. The compression problem now consists of choosing supernodes, superedges, and superedge weights so that the approximation error is minimized while the amount of compression is maximized. In this paper, we formulate this task as the 'simple weighted graph compression problem'. We then propose a much wider class of tasks under the name of 'generalized weighted graph compression problem'. The generalized task extends the optimization to preserve longer-range connectivities between nodes, not just individual edge weights. We study the properties of these problems and propose a range of algorithms to solve them, with dierent balances between complexity and quality of the result. We evaluate the problems and algorithms experimentally on real networks. The results indicate that weighted graphs can be compressed efficiently with relatively little compression error.
Resumo:
A generalization of Nash-Williams′ lemma is proved for the Structure of m-uniform null (m − k)-designs. It is then applied to various graph reconstruction problems. A short combinatorial proof of the edge reconstructibility of digraphs having regular underlying undirected graphs (e.g., tournaments) is given. A type of Nash-Williams′ lemma is conjectured for the vertex reconstruction problem.
Resumo:
Land cover (LC) changes play a major role in global as well as at regional scale patterns of the climate and biogeochemistry of the Earth system. LC information presents critical insights in understanding of Earth surface phenomena, particularly useful when obtained synoptically from remote sensing data. However, for developing countries and those with large geographical extent, regular LC mapping is prohibitive with data from commercial sensors (high cost factor) of limited spatial coverage (low temporal resolution and band swath). In this context, free MODIS data with good spectro-temporal resolution meet the purpose. LC mapping from these data has continuously evolved with advances in classification algorithms. This paper presents a comparative study of two robust data mining techniques, the multilayer perceptron (MLP) and decision tree (DT) on different products of MODIS data corresponding to Kolar district, Karnataka, India. The MODIS classified images when compared at three different spatial scales (at district level, taluk level and pixel level) shows that MLP based classification on minimum noise fraction components on MODIS 36 bands provide the most accurate LC mapping with 86% accuracy, while DT on MODIS 36 bands principal components leads to less accurate classification (69%).
Resumo:
Let G be an undirected graph with a positive real weight on each edge. It is shown that the number of minimum-weight cycles of G is bounded above by a polynomial in the number of edges of G. A similar bound holds if we wish to count the number of cycles with weight at most a constant multiple of the minimum weight of a cycle of G.