8 resultados para graph algorithms

em Deakin Research Online - Australia


Relevância:

60.00% 60.00%

Publicador:

Resumo:

We propose a new technique to perform unsupervised data classification (clustering) based on density induced metric and non-smooth optimization. Our goal is to automatically recognize multidimensional clusters of non-convex shape. We present a modification of the fuzzy c-means algorithm, which uses the data induced metric, defined with the help of Delaunay triangulation. We detail computation of the distances in such a metric using graph algorithms. To find optimal positions of cluster prototypes we employ the discrete gradient method of non-smooth optimization. The new clustering method is capable to identify non-convex overlapped d-dimensional clusters.


Relevância:

60.00% 60.00%

Publicador:

Resumo:

Recent advances in high throughput experiments and annotations via published literature have provided a wealth of interaction maps of several biomolecular networks, including metabolic, protein-protein, and protein-DNA interaction networks. The architecture of these molecular networks reveals important principles of cellular organization and molecular functions. Analyzing such networks, i.e., discovering dense regions in the network, is an important way to identify protein complexes and functional modules. This task has been formulated as the problem of finding heavy subgraphs, the Heaviest k-Subgraph Problem (k-HSP), which itself is NPhard. However, any method based on the k-HSP requires the parameter k and an exact solution of k-HSP may still end up as a “spurious” heavy subgraph, thus reducing its practicability in analyzing large scale biological networks. We proposed a new formulation, called the rank-HSP, and two dynamical systems to approximate its results. In addition, a novel metric, called the Standard deviation and Mean Ratio (SMR), is proposed for use in “spurious” heavy subgraphs to automate the discovery by setting a fixed threshold. Empirical results on both the simulated graphs and biological networks have demonstrated the efficiency and effectiveness of our proposal.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Lung modelling has emerged as a useful method for diagnosing lung diseases. Image segmentation is an important part of lung modelling systems. The ill-defined nature of image segmentation makes automated lung modelling difficult. Also, low resolution of lung images further increases the difficulty of the lung image segmentation. It is therefore important to identify a suitable segmentation algorithm that can enhance lung modelling accuracies. This paper investigates six image segmentation algorithms, used in medical imaging, and also their application to lung modelling. The algorithms are: normalised cuts, graph, region growing, watershed, Markov random field, and mean shift. The performance of the six segmentation algorithms is determined through a set of experiments on realistic 2D CT lung images. An experimental procedure is devised to measure the performance of the tested algorithms. The measured segmentation accuracies as well as execution times of the six algorithms are then compared and discussed.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

As one of the primary substances in a living organism, protein defines the character of each cell by interacting with the cellular environment to promote the cell’s growth and function [1]. Previous studies on proteomics indicate that the functions of different proteins could be assigned based upon protein structures [2,3]. The knowledge on protein structures gives us an overview of protein fold space and is helpful for the understanding of the evolutionary principles behind structure. By observing the architectures and topologies of the protein families, biological processes can be investigated more directly with much higher resolution and finer detail. For this reason, the analysis of protein, its structure and the interaction with the other materials is emerging as an important problem in bioinformatics. However, the determination of protein structures is experimentally expensive and time consuming, this makes scientists largely dependent on sequence rather than more general structure to infer the function of the protein at the present time. For this reason, data mining technology is introduced into this area to provide more efficient data processing and knowledge discovery approaches.

Unlike many data mining applications which lack available data, the protein structure determination problem and its interaction study, on the contrary, could utilize a vast amount of biologically relevant information on protein and its interaction, such as the protein data bank (PDB) [4], the structural classification of proteins (SCOP) databases [5], CATH databases [6], UniProt [7], and others. The difficulty of predicting protein structures, specially its 3D structures, and the interactions between proteins as shown in Figure 6.1, lies in the computational complexity of the data. Although a large number of approaches have been developed to determine the protein structures such as ab initio modelling [8], homology modelling [9] and threading [10], more efficient and reliable methods are still greatly needed.

In this chapter, we will introduce a state-of-the-art data mining technique, graph mining, which is good at defining and discovering interesting structural patterns in graphical data sets, and take advantage of its expressive power to study protein structures, including protein structure prediction and comparison, and protein-protein interaction (PPI). The current graph pattern mining methods will be described, and typical algorithms will be presented, together with their applications in the protein structure analysis.

The rest of the chapter is organized as follows: Section 6.2 will give a brief introduction of the fundamental knowledge of protein, the publicly accessible protein data resources and the current research status of protein analysis; in Section 6.3, we will pay attention to one of the state-of-the-art data mining methods, graph mining; then Section 6.4 surveys several existing work for protein structure analysis using advanced graph mining methods in the recent decade; finally, in Section 6.5, a conclusion with potential further work will be summarized.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A critical question in data mining is that can we always trust what discovered by a data mining system unconditionally? The answer is obviously not. If not, when can we trust the discovery then? What are the factors that affect the reliability of the discovery? How do they affect the reliability of the discovery? These are some interesting questions to be investigated. In this chapter we will firstly provide a definition and the measurements of reliability, and analyse the factors that affect the reliability. We then examine the impact of model complexity, weak links, varying sample sizes and the ability of different learners to the reliability of graphical model discovery. The experimental results reveal that (1) the larger sample size for the discovery, the higher reliability we will get; (2) the stronger a graph link is, the easier the discovery will be and thus the higher the reliability it can achieve; (3) the complexity of a graph also plays an important role in the discovery. The higher the complexity of a graph is, the more difficult to induce the graph and the lower reliability it would be. We also examined the performance difference of different discovery algorithms. This reveals the impact of discovery process. The experimental results show the superior reliability and robustness of MML method to standard significance tests in the recovery of graph links with small samples and weak links.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Smartphone applications are getting more and more popular and pervasive in our daily life, and are also attractive to malware writers due to their limited computing source and vulnerabilities. At the same time, we possess limited understanding of our opponents in cyberspace. In this paper, we investigate the propagation model of SMS/MMS-based worms through integrating semi-Markov process and social relationship graph. In our modeling, we use semi-Markov process to characterize state transition among mobile nodes, and hire social network theory, a missing element in many previous works, to enhance the proposed mobile malware propagation model. In order to evaluate the proposed models, we have developed a specific software, and collected a large scale real-world data for this purpose. The extensive experiments indicate that the proposed models and algorithms are effective and practical. © 2014 Elsevier Ltd. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Recommender systems have been successfully dealing with the problem of information overload. However, most recommendation methods suit to the scenarios where explicit feedback, e.g. ratings, are available, but might not be suitable for the most common scenarios with only implicit feedback. In addition, most existing methods only focus on user and item dimensions and neglect any additional contextual information, such as time and location. In this paper, we propose a graph-based generic recommendation framework, which constructs a Multi-Layer Context Graph (MLCG) from implicit feedback data, and then performs ranking algorithms in MLCG for context-aware recommendation. Specifically, MLCG incorporates a variety of contextual information into a recommendation process and models the interactions between users and items. Moreover, based on MLCG, two novel ranking methods are developed: Context-aware Personalized Random Walk (CPRW) captures user preferences and current situations, and Semantic Path-based Random Walk (SPRW) incorporates semantics of paths in MLCG into random walk model for recommendation. The experiments on two real-world datasets demonstrate the effectiveness of our approach.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Virtualization brought an immense commute in the modern technology especially in computer networks since last decade. The enormity of big data has led the massive graphs to be increased in size exponentially in recent years so that normal tools and algorithms are going weak to process it. Size diminution of the massive graphs is a big challenge in the current era and extraction of useful information from huge graphs is also problematic. In this paper, we presented a concept to design the virtual graph vGraph in the virtual plane above the original plane having original massive graph and proposed a novel cumulative similarity measure for vGraph. The use of vGraph is utile in lieu of massive graph in terms of space and time. Our proposed algorithm has two main parts. In the first part, virtual nodes are designed from the original nodes based on the calculation of cumulative similarity among them. In the second part, virtual edges are designed to link the virtual nodes based on the calculation of similarity measure among the original edges of the original massive graph. The algorithm is tested on synthetic and real-world datasets which shows the efficiency of our proposed algorithms.