21 resultados para graph matching algorithms

em Deakin Research Online - Australia


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Existing texture synthesis-from-example strategies for polygon meshes typically make use of three components: a multi-resolution mesh hierarchy that allows the overall nature of the pattern to be reproduced before filling in detail; a matching strategy that extends the synthesized texture using the best fit from a texture sample; and a transfer mechanism that copies the selected portion of the texture sample to the target surface. We introduce novel alternatives for each of these components. Use of p2-subdivision surfaces provides the mesh hierarchy and allows fine control over the surface complexity. Adaptive subdivision is used to create an even vertex distribution over the surface. Use of the graph defined by a surface region for matching, rather than a regular texture neighbourhood, provides for flexible control over the scale of the texture and allows simultaneous matching against multiple levels of an image pyramid created from the texture sample. We use graph cuts for texture transfer, adapting this scheme to the context of surface synthesis. The resulting surface textures are realistic, tolerant of local mesh detail and are comparable to results produced by texture neighbourhood sampling approaches.

Relevância:

100.00% 100.00%

Publicador:

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Graph matching is an important class of methods in pattern recognition. Typically, a graph representing an unknown pattern is matched with a database of models. If the database of model graphs is large, an additional factor in induced into the overall complexity of the matching process. Various techniques for reducing the influence of this additional factor have been described in the literature. In this paper we propose to extract simple features from a graph and use them to eliminate candidate graphs from the database. The most powerful set of features and a decision tree useful for candidate elimination are found by means of the C4.5 algorithm, which was originally proposed for inductive learning of classication rules. Experimental results are reported demonstrating that effcient candidate elimination can be achieved by the proposed procedure.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Many tasks in computer vision can be expressed as graph problems. This allows the task to be solved using a well studied algorithm, however many of these algorithms are of exponential complexity. This is a disadvantage when considered in the context of searching a database of images or videos for similarity. Work by Mesaner and Bunke (1995) has suggested a new class of graph matching algorithms which uses a priori knowledge about a database of models to reduce the time taken during online classification. This paper presents a new algorithm which extends the earlier work to detection of the largest common subgraph.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

While the largest common subgraph (LCSG) between a query and a database of models can provide an elegant and intuitive measure of similarity for many applications, it is computationally expensive to compute. Recently developed algorithms for subgraph isomorphism detection take advantage of prior knowledge of a database of models to improve the speed of on-line matching. This paper presents a new algorithm based on similar principles to solve the largest common subgraph problem. The new algorithm significantly reduces the computational complexity of detection of the LCSG between a known database of models, and a query given on-line.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Signature-based malware detection systems have been a much used response to the pervasive problem of malware. Identification of malware variants is essential to a detection system and is made possible by identifying invariant characteristics in related samples. To classify the packed and polymorphic malware, this paper proposes a novel system, named Malwise, for malware classification using a fast application-level emulator to reverse the code packing transformation, and two flowgraph matching algorithms to perform classification. An exact flowgraph matching algorithm is employed that uses string-based signatures, and is able to detect malware with near real-time performance. Additionally, a more effective approximate flowgraph matching algorithm is proposed that uses the decompilation technique of structuring to generate string-based signatures amenable to the string edit distance. We use real and synthetic malware to demonstrate the effectiveness and efficiency of Malwise. Using more than 15,000 real malware, collected from honeypots, the effectiveness is validated by showing that there is an 88 percent probability that new malware is detected as a variant of existing malware. The efficiency is demonstrated from a smaller sample set of malware where 86 percent of the samples can be classified in under 1.3 seconds.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Current microarray databases use different terminologies and structures and thereby limit the sharing of data and collating of results between laboratories. Consequently, an effective integrated microarray data model is required. One important process to develop such an integrated database is schema matching. In this paper, we propose an effective schema matching approach called MDSM, to syntactically and semantically map attributes of different microarray schemas. The contribution from this work will be used later to create microarray global schemas. Since microarray data is complex, we use microarray ontology to improve the measuring accuracy of the similarity between attributes. The similarity relations can be represented as weighted bipartite graphs. We determine the best schema matching by computing the optimal matching in a bipartite graph using the Hungarian optimisation method. Experimental results show that our schema matching approach is effective and flexible to use in different kinds of database models such as; database schema, XML schema, and web site map. Finally, a case study on an existing public microarray schema is carried out using the proposed method.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

With an increasing emphasis on the emerging automatic person identification application, biometrics based, especially fingerprint-based identification, is receiving a lot of attention. This research developed an automatic fingerprint recognition system (AFRS) based on a hybrid between minutiae and correlation based techniques to represent and to match fingerprint; it improved each technique individually. It was noticed that, in the hybrid approach, as a result of an improvement of minutiae extraction algorithm in post-process phase that combines the two algorithms, the performance of the minutia algorithm improved. An improvement in the ridge algorithm that used centre point in fingerprint instead of reference point was also observed. Experiments indicate that the hybrid technique performs much better than each algorithm individually.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Lung modelling has emerged as a useful method for diagnosing lung diseases. Image segmentation is an important part of lung modelling systems. The ill-defined nature of image segmentation makes automated lung modelling difficult. Also, low resolution of lung images further increases the difficulty of the lung image segmentation. It is therefore important to identify a suitable segmentation algorithm that can enhance lung modelling accuracies. This paper investigates six image segmentation algorithms, used in medical imaging, and also their application to lung modelling. The algorithms are: normalised cuts, graph, region growing, watershed, Markov random field, and mean shift. The performance of the six segmentation algorithms is determined through a set of experiments on realistic 2D CT lung images. An experimental procedure is devised to measure the performance of the tested algorithms. The measured segmentation accuracies as well as execution times of the six algorithms are then compared and discussed.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

As one of the primary substances in a living organism, protein defines the character of each cell by interacting with the cellular environment to promote the cell’s growth and function [1]. Previous studies on proteomics indicate that the functions of different proteins could be assigned based upon protein structures [2,3]. The knowledge on protein structures gives us an overview of protein fold space and is helpful for the understanding of the evolutionary principles behind structure. By observing the architectures and topologies of the protein families, biological processes can be investigated more directly with much higher resolution and finer detail. For this reason, the analysis of protein, its structure and the interaction with the other materials is emerging as an important problem in bioinformatics. However, the determination of protein structures is experimentally expensive and time consuming, this makes scientists largely dependent on sequence rather than more general structure to infer the function of the protein at the present time. For this reason, data mining technology is introduced into this area to provide more efficient data processing and knowledge discovery approaches.

Unlike many data mining applications which lack available data, the protein structure determination problem and its interaction study, on the contrary, could utilize a vast amount of biologically relevant information on protein and its interaction, such as the protein data bank (PDB) [4], the structural classification of proteins (SCOP) databases [5], CATH databases [6], UniProt [7], and others. The difficulty of predicting protein structures, specially its 3D structures, and the interactions between proteins as shown in Figure 6.1, lies in the computational complexity of the data. Although a large number of approaches have been developed to determine the protein structures such as ab initio modelling [8], homology modelling [9] and threading [10], more efficient and reliable methods are still greatly needed.

In this chapter, we will introduce a state-of-the-art data mining technique, graph mining, which is good at defining and discovering interesting structural patterns in graphical data sets, and take advantage of its expressive power to study protein structures, including protein structure prediction and comparison, and protein-protein interaction (PPI). The current graph pattern mining methods will be described, and typical algorithms will be presented, together with their applications in the protein structure analysis.

The rest of the chapter is organized as follows: Section 6.2 will give a brief introduction of the fundamental knowledge of protein, the publicly accessible protein data resources and the current research status of protein analysis; in Section 6.3, we will pay attention to one of the state-of-the-art data mining methods, graph mining; then Section 6.4 surveys several existing work for protein structure analysis using advanced graph mining methods in the recent decade; finally, in Section 6.5, a conclusion with potential further work will be summarized.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Texture synthesis employs neighbourhood matching to generate appropriate new content. Terrain synthesis has the added constraint that new content must be geographically plausible. The profile recognition and polygon breaking algorithm (PPA) [Chang et al. 1998] provides a robust mechanism for characterizing terrain as systems of valley and ridge lines in digital elevation maps. We exploit this to create a terrain characterization metric that is robust, efficient to compute and is sensitive to terrain properties.

Terrain regions are characterized as a minimum spanning tree derived from a graph created from the sample points of the elevation map which are encoded as weights in the edges of the graph. This formulation allows us to provide a single consistent feature definition that is sensitive to the pattern of ridges and valleys in the terrain Alternative formulations of these weights provide richer characteristicmeasures and we provide examples of alternate definitions based on curvature and contour measures.

We show that the measure is robust, with a significant portion derived directly from information local to the terrain sample. Global terrain characteristics introduce the issue of over- and underconnected valley/ridge lines when working with sub-regions. This is addressed by providing two graph construction strategies, which respectively provide an upper bound on connectivity as a single spanning tree, and a lower bound as a forest of trees.

Efficient minimum spanning tree algorithms are adapted to the context of terrain data and are shown to provide substantially better performance than previous PPA implementations. In particular, these are able to characterize valley and ridge behaviour at every point even in large elevation maps, providing a measure sensitive to terrain features at all scales.

The resulting graph based formulation provides an efficient and elegant algorithm for characterizing terrain features. The measure can be calculated efficiently, is robust under changes of neighbourhood position, size and resolution and the hybrid measure is sensitive to terrain features both locally and globally.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The challenge of comparing two or more genomes that have undergone recombination and substantial amounts of segmental loss and gain has recently been addressed for small numbers of genomes. However, datasets of hundreds of genomes are now common and their sizes will only increase in the future. Multiple sequence alignment of hundreds of genomes remains an intractable problem due to quadratic increases in compute time and memory footprint. To date, most alignment algorithms are designed for commodity clusters without parallelism. Hence, we propose the design of a multiple sequence alignment algorithm on massively parallel, distributed memory supercomputers to enable research into comparative genomics on large data sets. Following the methodology of the sequential progressiveMauve algorithm, we design data structures including sequences and sorted k-mer lists on the IBM Blue Gene/P supercomputer (BG/P). Preliminary results show that we can reduce the memory footprint so that we can potentially align over 250 bacterial genomes on a single BG/P compute node. We verify our results on a dataset of E.coli, Shigella and S.pneumoniae genomes. Our implementation returns results matching those of the original algorithm but in 1/2 the time and with 1/4 the memory footprint for scaffold building. In this study, we have laid the basis for multiple sequence alignment of large-scale datasets on a massively parallel, distributed memory supercomputer, thus enabling comparison of hundreds instead of a few genome sequences within reasonable time.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A critical question in data mining is that can we always trust what discovered by a data mining system unconditionally? The answer is obviously not. If not, when can we trust the discovery then? What are the factors that affect the reliability of the discovery? How do they affect the reliability of the discovery? These are some interesting questions to be investigated. In this chapter we will firstly provide a definition and the measurements of reliability, and analyse the factors that affect the reliability. We then examine the impact of model complexity, weak links, varying sample sizes and the ability of different learners to the reliability of graphical model discovery. The experimental results reveal that (1) the larger sample size for the discovery, the higher reliability we will get; (2) the stronger a graph link is, the easier the discovery will be and thus the higher the reliability it can achieve; (3) the complexity of a graph also plays an important role in the discovery. The higher the complexity of a graph is, the more difficult to induce the graph and the lower reliability it would be. We also examined the performance difference of different discovery algorithms. This reveals the impact of discovery process. The experimental results show the superior reliability and robustness of MML method to standard significance tests in the recovery of graph links with small samples and weak links.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper describes object-centered symbolic representation and distributed matching strategies of 3D objects in a schematic form which occur in engineering drawings and maps. The object-centered representation has a hierarchical structure and is constructed from symbolic representations of schematics. With this representation, two independent schematics representing the same object can be matched. We also consider matching strategies using distributed algorithms. The object recognition is carried out with two matching methods: (1) matching between an object model and observed data at the lowest level of the hierarchy, and (2) constraints propagation. The first is carried out with symbolic Hopfield-type neural networks and the second is achieved via hierarchical winner-takes-all algorithms