23 resultados para Graph analytics
                                
Resumo:
In this paper we evaluate and compare two representativeand popular distributed processing engines for large scalebig data analytics, Spark and graph based engine GraphLab. Wedesign a benchmark suite including representative algorithmsand datasets to compare the performances of the computingengines, from performance aspects of running time, memory andCPU usage, network and I/O overhead. The benchmark suite istested on both local computer cluster and virtual machines oncloud. By varying the number of computers and memory weexamine the scalability of the computing engines with increasingcomputing resources (such as CPU and memory). We also runcross-evaluation of generic and graph based analytic algorithmsover graph processing and generic platforms to identify thepotential performance degradation if only one processing engineis available. It is observed that both computing engines showgood scalability with increase of computing resources. WhileGraphLab largely outperforms Spark for graph algorithms, ithas close running time performance as Spark for non-graphalgorithms. Additionally the running time with Spark for graphalgorithms over cloud virtual machines is observed to increaseby almost 100% compared to over local computer clusters.
                                
Resumo:
Over the last decade, there has been a trend where water utility companies aim to make water distribution networks more intelligent in order to improve their quality of service, reduce water waste, minimize maintenance costs etc., by incorporating IoT technologies. Current state of the art solutions use expensive power hungry deployments to monitor and transmit water network states periodically in order to detect anomalous behaviors such as water leakage and bursts. However, more than 97% of water network assets are remote away from power and are often in geographically remote underpopulated areas, facts that make current approaches unsuitable for next generation more dynamic adaptive water networks. Battery-driven wireless sensor/actuator based solutions are theoretically the perfect choice to support next generation water distribution. In this paper, we present an end-to-end water leak localization system, which exploits edge processing and enables the use of battery-driven sensor nodes. Our system combines a lightweight edge anomaly detection algorithm based on compression rates and an efficient localization algorithm based on graph theory. The edge anomaly detection and localization elements of the systems produce a timely and accurate localization result and reduce the communication by 99% compared to the traditional periodic communication. We evaluated our schemes by deploying non-intrusive sensors measuring vibrational data on a real-world water test rig that have had controlled leakage and burst scenarios implemented.
                                
Resumo:
The problem of vertex coloring in random graphs is studied using methods of statistical physics and probability. Our analytical results are compared to those obtained by exact enumeration and Monte Carlo simulations. We critically discuss the merits and shortcomings of the various methods, and interpret the results obtained. We present an exact analytical expression for the two-coloring problem as well as general replica symmetric approximated solutions for the thermodynamics of the graph coloring problem with p colors and K-body edges. ©2002 The American Physical Society.
                                
Resumo:
This thesis includes analysis of disordered spin ensembles corresponding to Exact Cover, a multi-access channel problem, and composite models combining sparse and dense interactions. The satisfiability problem in Exact Cover is addressed using a statistical analysis of a simple branch and bound algorithm. The algorithm can be formulated in the large system limit as a branching process, for which critical properties can be analysed. Far from the critical point a set of differential equations may be used to model the process, and these are solved by numerical integration and exact bounding methods. The multi-access channel problem is formulated as an equilibrium statistical physics problem for the case of bit transmission on a channel with power control and synchronisation. A sparse code division multiple access method is considered and the optimal detection properties are examined in typical case by use of the replica method, and compared to detection performance achieved by interactive decoding methods. These codes are found to have phenomena closely resembling the well-understood dense codes. The composite model is introduced as an abstraction of canonical sparse and dense disordered spin models. The model includes couplings due to both dense and sparse topologies simultaneously. The new type of codes are shown to outperform sparse and dense codes in some regimes both in optimal performance, and in performance achieved by iterative detection methods in finite systems.
                                
Resumo:
In this article, we discuss the state of the art of models for customer engagement and the problems that are inherent to calibrating and implementing these models. The authors first provide an overview of the data available for customer analytics and discuss recent developments. Next, the authors discuss the models used for studying customer engagement, where they distinguish the following stages: customer acquisition, customer development, and customer retention. Finally, they discuss several organizational issues of analytics for customer engagement, which constitute barriers for introducing analytics for customer engagement.
                                
Resumo:
In this paper we propose an approach based on self-interested autonomous cameras, which exchange responsibility for tracking objects in a market mechanism, in order to maximise their own utility. A novel ant-colony inspired mechanism is used to grow the vision graph during runtime, which may then be used to optimise communication between cameras. The key benefits of our completely decentralised approach are on the one hand generating the vision graph online which permits the addition and removal cameras to the network during runtime and on the other hand relying only on local information, increasing the robustness of the system. Since our market-based approach does not rely on a priori topology information, the need for any multi-camera calibration can be avoided. © 2011 IEEE.
                                
Resumo:
In this article we present an approach to object tracking handover in a network of smart cameras, based on self-interested autonomous agents, which exchange responsibility for tracking objects in a market mechanism, in order to maximise their own utility. A novel ant-colony inspired mechanism is used to learn the vision graph, that is, the camera neighbourhood relations, during runtime, which may then be used to optimise communication between cameras. The key benefits of our completely decentralised approach are on the one hand generating the vision graph online, enabling efficient deployment in unknown scenarios and camera network topologies, and on the other hand relying only on local information, increasing the robustness of the system. Since our market-based approach does not rely on a priori topology information, the need for any multicamera calibration can be avoided. We have evaluated our approach both in a simulation study and in network of real distributed smart cameras.
                                
Resumo:
Graph embedding is a general framework for subspace learning. However, because of the well-known outlier-sensitiveness disadvantage of the L2-norm, conventional graph embedding is not robust to outliers which occur in many practical applications. In this paper, an improved graph embedding algorithm (termed LPP-L1) is proposed by replacing L2-norm with L1-norm. In addition to its robustness property, LPP-L1 avoids small sample size problem. Experimental results on both synthetic and real-world data demonstrate these advantages. © 2009 Elsevier B.V. All rights reserved.
                                
Resumo:
GraphChi is the first reported disk-based graph engine that can handle billion-scale graphs on a single PC efficiently. GraphChi is able to execute several advanced data mining, graph mining and machine learning algorithms on very large graphs. With the novel technique of parallel sliding windows (PSW) to load subgraph from disk to memory for vertices and edges updating, it can achieve data processing performance close to and even better than those of mainstream distributed graph engines. GraphChi mentioned that its memory is not effectively utilized with large dataset, which leads to suboptimal computation performances. In this paper we are motivated by the concepts of 'pin ' from TurboGraph and 'ghost' from GraphLab to propose a new memory utilization mode for GraphChi, which is called Part-in-memory mode, to improve the GraphChi algorithm performance. The main idea is to pin a fixed part of data inside the memory during the whole computing process. Part-in-memory mode is successfully implemented with only about 40 additional lines of code to the original GraphChi engine. Extensive experiments are performed with large real datasets (including Twitter graph with 1.4 billion edges). The preliminary results show that Part-in-memory mode memory management approach effectively reduces the GraphChi running time by up to 60% in PageRank algorithm. Interestingly it is found that a larger portion of data pinned in memory does not always lead to better performance in the case that the whole dataset cannot be fitted in memory. There exists an optimal portion of data which should be kept in the memory to achieve the best computational performance.
                                
Resumo:
Short text messages a.k.a Microposts (e.g. Tweets) have proven to be an effective channel for revealing information about trends and events, ranging from those related to Disaster (e.g. hurricane Sandy) to those related to Violence (e.g. Egyptian revolution). Being informed about such events as they occur could be extremely important to authorities and emergency professionals by allowing such parties to immediately respond. In this work we study the problem of topic classification (TC) of Microposts, which aims to automatically classify short messages based on the subject(s) discussed in them. The accurate TC of Microposts however is a challenging task since the limited number of tokens in a post often implies a lack of sufficient contextual information. In order to provide contextual information to Microposts, we present and evaluate several graph structures surrounding concepts present in linked knowledge sources (KSs). Traditional TC techniques enrich the content of Microposts with features extracted only from the Microposts content. In contrast our approach relies on the generation of different weighted semantic meta-graphs extracted from linked KSs. We introduce a new semantic graph, called category meta-graph. This novel meta-graph provides a more fine grained categorisation of concepts providing a set of novel semantic features. Our findings show that such category meta-graph features effectively improve the performance of a topic classifier of Microposts. Furthermore our goal is also to understand which semantic feature contributes to the performance of a topic classifier. For this reason we propose an approach for automatic estimation of accuracy loss of a topic classifier on new, unseen Microposts. We introduce and evaluate novel topic similarity measures, which capture the similarity between the KS documents and Microposts at a conceptual level, considering the enriched representation of these documents. Extensive evaluation in the context of Emergency Response (ER) and Violence Detection (VD) revealed that our approach outperforms previous approaches using single KS without linked data and Twitter data only up to 31.4% in terms of F1 measure. Our main findings indicate that the new category graph contains useful information for TC and achieves comparable results to previously used semantic graphs. Furthermore our results also indicate that the accuracy of a topic classifier can be accurately predicted using the enhanced text representation, outperforming previous approaches considering content-based similarity measures. © 2014 Elsevier B.V. All rights reserved.
                                
Resumo:
This paper considers the problem of low-dimensional visualisation of very high dimensional information sources for the purpose of situation awareness in the maritime environment. In response to the requirement for human decision support aids to reduce information overload (and specifically, data amenable to inter-point relative similarity measures) appropriate to the below-water maritime domain, we are investigating a preliminary prototype topographic visualisation model. The focus of the current paper is on the mathematical problem of exploiting a relative dissimilarity representation of signals in a visual informatics mapping model, driven by real-world sonar systems. A realistic noise model is explored and incorporated into non-linear and topographic visualisation algorithms building on the approach of [9]. Concepts are illustrated using a real world dataset of 32 hydrophones monitoring a shallow-water environment in which targets are present and dynamic.
                                
Resumo:
In this paper we propose a quantum algorithm to measure the similarity between a pair of unattributed graphs. We design an experiment where the two graphs are merged by establishing a complete set of connections between their nodes and the resulting structure is probed through the evolution of continuous-time quantum walks. In order to analyze the behavior of the walks without causing wave function collapse, we base our analysis on the recently introduced quantum Jensen-Shannon divergence. In particular, we show that the divergence between the evolution of two suitably initialized quantum walks over this structure is maximum when the original pair of graphs is isomorphic. We also prove that under special conditions the divergence is minimum when the sets of eigenvalues of the Hamiltonians associated with the two original graphs have an empty intersection.
                                
Resumo:
In this paper, we use the quantum Jensen-Shannon divergence as a means of measuring the information theoretic dissimilarity of graphs and thus develop a novel graph kernel. In quantum mechanics, the quantum Jensen-Shannon divergence can be used to measure the dissimilarity of quantum systems specified in terms of their density matrices. We commence by computing the density matrix associated with a continuous-time quantum walk over each graph being compared. In particular, we adopt the closed form solution of the density matrix introduced in Rossi et al. (2013) [27,28] to reduce the computational complexity and to avoid the cumbersome task of simulating the quantum walk evolution explicitly. Next, we compare the mixed states represented by the density matrices using the quantum Jensen-Shannon divergence. With the quantum states for a pair of graphs described by their density matrices to hand, the quantum graph kernel between the pair of graphs is defined using the quantum Jensen-Shannon divergence between the graph density matrices. We evaluate the performance of our kernel on several standard graph datasets from both bioinformatics and computer vision. The experimental results demonstrate the effectiveness of the proposed quantum graph kernel.
                                
Resumo:
In this paper, we develop a new graph kernel by using the quantum Jensen-Shannon divergence and the discrete-time quantum walk. To this end, we commence by performing a discrete-time quantum walk to compute a density matrix over each graph being compared. For a pair of graphs, we compare the mixed quantum states represented by their density matrices using the quantum Jensen-Shannon divergence. With the density matrices for a pair of graphs to hand, the quantum graph kernel between the pair of graphs is defined by exponentiating the negative quantum Jensen-Shannon divergence between the graph density matrices. We evaluate the performance of our kernel on several standard graph datasets, and demonstrate the effectiveness of the new kernel.
                                
Resumo:
We propose a family of attributed graph kernels based on mutual information measures, i.e., the Jensen-Tsallis (JT) q-differences (for q ∈ [1,2]) between probability distributions over the graphs. To this end, we first assign a probability to each vertex of the graph through a continuous-time quantum walk (CTQW). We then adopt the tree-index approach [1] to strengthen the original vertex labels, and we show how the CTQW can induce a probability distribution over these strengthened labels. We show that our JT kernel (for q = 1) overcomes the shortcoming of discarding non-isomorphic substructures arising in the R-convolution kernels. Moreover, we prove that the proposed JT kernels generalize the Jensen-Shannon graph kernel [2] (for q = 1) and the classical subtree kernel [3] (for q = 2), respectively. Experimental evaluations demonstrate the effectiveness and efficiency of the JT kernels.
 
                    