982 resultados para graph matching algorithms


Relevância:

80.00% 80.00%

Publicador:

Resumo:

In the first part of the thesis we explore three fundamental questions that arise naturally when we conceive a machine learning scenario where the training and test distributions can differ. Contrary to conventional wisdom, we show that in fact mismatched training and test distribution can yield better out-of-sample performance. This optimal performance can be obtained by training with the dual distribution. This optimal training distribution depends on the test distribution set by the problem, but not on the target function that we want to learn. We show how to obtain this distribution in both discrete and continuous input spaces, as well as how to approximate it in a practical scenario. Benefits of using this distribution are exemplified in both synthetic and real data sets.

In order to apply the dual distribution in the supervised learning scenario where the training data set is fixed, it is necessary to use weights to make the sample appear as if it came from the dual distribution. We explore the negative effect that weighting a sample can have. The theoretical decomposition of the use of weights regarding its effect on the out-of-sample error is easy to understand but not actionable in practice, as the quantities involved cannot be computed. Hence, we propose the Targeted Weighting algorithm that determines if, for a given set of weights, the out-of-sample performance will improve or not in a practical setting. This is necessary as the setting assumes there are no labeled points distributed according to the test distribution, only unlabeled samples.

Finally, we propose a new class of matching algorithms that can be used to match the training set to a desired distribution, such as the dual distribution (or the test distribution). These algorithms can be applied to very large datasets, and we show how they lead to improved performance in a large real dataset such as the Netflix dataset. Their computational complexity is the main reason for their advantage over previous algorithms proposed in the covariate shift literature.

In the second part of the thesis we apply Machine Learning to the problem of behavior recognition. We develop a specific behavior classifier to study fly aggression, and we develop a system that allows analyzing behavior in videos of animals, with minimal supervision. The system, which we call CUBA (Caltech Unsupervised Behavior Analysis), allows detecting movemes, actions, and stories from time series describing the position of animals in videos. The method summarizes the data, as well as it provides biologists with a mathematical tool to test new hypotheses. Other benefits of CUBA include finding classifiers for specific behaviors without the need for annotation, as well as providing means to discriminate groups of animals, for example, according to their genetic line.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This report is an introduction to the concept of treewidth, a property of graphs that has important implications in algorithms. Some basic concepts of graph theory are presented in the first chapter for those readers that are not familiar with the notation. In Chapter 2, the definition of treewidth and some different ways of characterizing it are explained. The last two chapters focus on the algorithmic implications of treewidth, which are very relevant in Computer Science. An algorithm to compute the treewidth of a graph is presented and its result can be later applied to many other problems in graph theory, like those introduced in the last chapter.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper examines different ways for measuring similarity between software design models for the purpose of software reuse. Current approaches to this problem are discussed and a set of suitable similarity metrics are proposed and evaluated. Work on the optimisation of weights to increase the competence of a CBR system is presented. A graph matching algorithm and associated metrics capturing the structural similarity between UML class diagrams is presented and demonstrated through an example case.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Measuring the structural similarity of graphs is a challenging and outstanding problem. Most of the classical approaches of the so-called exact graph matching methods are based on graph or subgraph isomorphic relations of the underlying graphs. In contrast to these methods in this paper we introduce a novel approach to measure the structural similarity of directed and undirected graphs that is mainly based on margins of feature vectors representing graphs. We introduce novel graph similarity and dissimilarity measures, provide some properties and analyze their algorithmic complexity. We find that the computational complexity of our measures is polynomial in the graph size and, hence, significantly better than classical methods from, e.g. exact graph matching which are NP-complete. Numerically, we provide some examples of our measure and compare the results with the well-known graph edit distance. (c) 2006 Elsevier Inc. All rights reserved.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Tese de doutoramento, Informática (Bioinformática), Universidade de Lisboa, Faculdade de Ciências, 2014

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Dissertação apresentada à Escola Superior de Comunicação Social como parte dos requisitos para obtenção de grau de mestre em Audiovisual e Multimédia.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Most current-generation Wireless Sensor Network (WSN) nodes are equipped with multiple sensors of various types, and therefore support for multi-tasking and multiple concurrent applications is becoming increasingly common. This trend has been fostering the design of WSNs allowing several concurrent users to deploy applications with dissimilar requirements. In this paper, we extend the advantages of a holistic programming scheme by designing a novel compiler-assisted scheduling approach (called REIS) able to identify and eliminate redundancies across applications. To achieve this useful high-level optimization, we model each user application as a linear sequence of executable instructions. We show how well-known string-matching algorithms such as the Longest Common Subsequence (LCS) and the Shortest Common Super-sequence (SCS) can be used to produce an optimal merged monolithic sequence of the deployed applications that takes into account embedded scheduling information. We show that our approach can help in achieving about 60% average energy savings in processor usage compared to the normal execution of concurrent applications.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The underground scenarios are one of the most challenging environments for accurate and precise 3d mapping where hostile conditions like absence of Global Positioning Systems, extreme lighting variations and geometrically smooth surfaces may be expected. So far, the state-of-the-art methods in underground modelling remain restricted to environments in which pronounced geometric features are abundant. This limitation is a consequence of the scan matching algorithms used to solve the localization and registration problems. This paper contributes to the expansion of the modelling capabilities to structures characterized by uniform geometry and smooth surfaces, as is the case of road and train tunnels. To achieve that, we combine some state of the art techniques from mobile robotics, and propose a method for 6DOF platform positioning in such scenarios, that is latter used for the environment modelling. A visual monocular Simultaneous Localization and Mapping (MonoSLAM) approach based on the Extended Kalman Filter (EKF), complemented by the introduction of inertial measurements in the prediction step, allows our system to localize himself over long distances, using exclusively sensors carried on board a mobile platform. By feeding the Extended Kalman Filter with inertial data we were able to overcome the major problem related with MonoSLAM implementations, known as scale factor ambiguity. Despite extreme lighting variations, reliable visual features were extracted through the SIFT algorithm, and inserted directly in the EKF mechanism according to the Inverse Depth Parametrization. Through the 1-Point RANSAC (Random Sample Consensus) wrong frame-to-frame feature matches were rejected. The developed method was tested based on a dataset acquired inside a road tunnel and the navigation results compared with a ground truth obtained by post-processing a high grade Inertial Navigation System and L1/L2 RTK-GPS measurements acquired outside the tunnel. Results from the localization strategy are presented and analyzed.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The Kineticist's Workbench is a program that simulates chemical reaction mechanisms by predicting, generating, and interpreting numerical data. Prior to simulation, it analyzes a given mechanism to predict that mechanism's behavior; it then simulates the mechanism numerically; and afterward, it interprets and summarizes the data it has generated. In performing these tasks, the Workbench uses a variety of techniques: graph- theoretic algorithms (for analyzing mechanisms), traditional numerical simulation methods, and algorithms that examine simulation results and reinterpret them in qualitative terms. The Workbench thus serves as a prototype for a new class of scientific computational tools---tools that provide symbiotic collaborations between qualitative and quantitative methods.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In this class, we will discuss the nature of network evolution and some selected network processes. We will discuss graph generation algorithms that generate networks with different interesting characteristics. Optional : The Structure and Function of Complex Networks (chapter 8), M.E.J. Newman, SIAM Review 45 167--256 (2003); Optional: Emergence of Scaling in Random Networks, A.L. Barabasi and R. Albert, Science 286, 509 (1999)

Relevância:

80.00% 80.00%

Publicador:

Resumo:

A novel Linear Hashtable Method Predicted Hexagonal Search (LHMPHS) method for block based motion compensation is proposed. Fast block matching algorithms use the origin as the initial search center, which often does not track motion very well. To improve the accuracy of the fast BMA's, we employ a predicted starting search point, which reflects the motion trend of the current block. The predicted search centre is found closer to the global minimum. Thus the center-biased BMA's can be used to find the motion vector more efficiently. The performance of the algorithm is evaluated by using standard video sequences, considers the three important metrics: The results show that the proposed algorithm enhances the accuracy of current hexagonal algorithms and is better than Full Search, Logarithmic Search etc.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Generating quadrilateral meshes is a highly non-trivial task, as design decisions are frequently driven by specific application demands. Automatic techniques can optimize objective quality metrics, such as mesh regularity, orthogonality, alignment and adaptivity; however, they cannot make subjective design decisions. There are a few quad meshing approaches that offer some mechanisms to include the user in the mesh generation process; however, these techniques either require a large amount of user interaction or do not provide necessary or easy to use inputs. Here, we propose a template-based approach for generating quad-only meshes from triangle surfaces. Our approach offers a flexible mechanism to allow external input, through the definition of alignment features that are respected during the mesh generation process. While allowing user inputs to support subjective design decisions, our approach also takes into account objective quality metrics to produce semi-regular, quad-only meshes that align well to desired surface features. Published by Elsevier Ltd.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Signature-based malware detection systems have been a much used response to the pervasive problem of malware. Identification of malware variants is essential to a detection system and is made possible by identifying invariant characteristics in related samples. To classify the packed and polymorphic malware, this paper proposes a novel system, named Malwise, for malware classification using a fast application-level emulator to reverse the code packing transformation, and two flowgraph matching algorithms to perform classification. An exact flowgraph matching algorithm is employed that uses string-based signatures, and is able to detect malware with near real-time performance. Additionally, a more effective approximate flowgraph matching algorithm is proposed that uses the decompilation technique of structuring to generate string-based signatures amenable to the string edit distance. We use real and synthetic malware to demonstrate the effectiveness and efficiency of Malwise. Using more than 15,000 real malware, collected from honeypots, the effectiveness is validated by showing that there is an 88 percent probability that new malware is detected as a variant of existing malware. The efficiency is demonstrated from a smaller sample set of malware where 86 percent of the samples can be classified in under 1.3 seconds.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This article introduces the software program called EthoSeq, which is designed to extract probabilistic behavioral sequences (tree-generated sequences, or TGSs) from observational data and to prepare a TGS-species matrix for phylogenetic analysis. The program uses Graph Theory algorithms to automatically detect behavioral patterns within the observational sessions. It includes filtering tools to adjust the search procedure to user-specified statistical needs. Preliminary analyses of data sets, such as grooming sequences in birds and foraging tactics in spiders, uncover a large number of TGSs which together yield single phylogenetic trees. An example of the use of the program is our analysis of felid grooming sequences, in which we have obtained 1,386 felid grooming TGSs for seven species, resulting in a single phylogeny. These results show that behavior is definitely useful in phylogenetic analysis. EthoSeq simplifies and automates such analyses, uncovers much of the hidden patterns of long behavioral sequences, and prepares this data for further analysis with standard phylogenetic programs. We hope it will encourage many empirical studies on the evolution of behavior.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Genetic variation within and among accessions of the genus Arachis representing sections Extranervosae, Caulorrhizae, Heteranthae, and Triseminatae was evaluated using RFLP and RAPD markers. RAPD markers revealed a higher level of genetic diversity than did RFLP markers, both within and among the species evaluated. Phenograms based on various band-matching algorithms revealed three major clusters of similarity among the sections evaluated. The first group included the species from section Extranervosae, the second group consisted of sections Triseminatae, Caulorrhizae, and Heteranthae, and the third group consisted of one accession of Arachis hypogaea, which had been included as a representative of section Arachis. The phenograms obtained from the RAPD and RFLP data were similar but not identical. Arachis pietrarellii, assayed only by RAPD, showed a high degree of genetic similarity with Arachis villosulicarpa. This observation supported the hypothesis that these two species are closely related. It was also shown that accession V 7786, previously considered to be Arachis sp. aff. pietrarellii, and assayed using both RFLPs and RAPDs, was possibly a new species from section Extranervosae, but very distinct from A. pietrarellii.