833 resultados para Edit distance
Resumo:
The boxicity (cubicity) of a graph G, denoted by box(G) (respectively cub(G)), is the minimum integer k such that G can be represented as the intersection graph of axis parallel boxes (cubes) in ℝ k . The problem of computing boxicity (cubicity) is known to be inapproximable in polynomial time even for graph classes like bipartite, co-bipartite and split graphs, within an O(n 0.5 − ε ) factor for any ε > 0, unless NP = ZPP. We prove that if a graph G on n vertices has a clique on n − k vertices, then box(G) can be computed in time n22O(k2logk) . Using this fact, various FPT approximation algorithms for boxicity are derived. The parameter used is the vertex (or edge) edit distance of the input graph from certain graph families of bounded boxicity - like interval graphs and planar graphs. Using the same fact, we also derive an O(nloglogn√logn√) factor approximation algorithm for computing boxicity, which, to our knowledge, is the first o(n) factor approximation algorithm for the problem. We also present an FPT approximation algorithm for computing the cubicity of graphs, with vertex cover number as the parameter.
Resumo:
A system is described that tracks moving objects in a video dataset so as to extract a representation of the objects' 3D trajectories. The system then finds hierarchical clusters of similar trajectories in the video dataset. Objects' motion trajectories are extracted via an EKF formulation that provides each object's 3D trajectory up to a constant factor. To increase accuracy when occlusions occur, multiple tracking hypotheses are followed. For trajectory-based clustering and retrieval, a modified version of edit distance, called longest common subsequence (LCSS) is employed. Similarities are computed between projections of trajectories on coordinate axes. Trajectories are grouped based, using an agglomerative clustering algorithm. To check the validity of the approach, experiments using real data were performed.
Resumo:
Measuring the structural similarity of graphs is a challenging and outstanding problem. Most of the classical approaches of the so-called exact graph matching methods are based on graph or subgraph isomorphic relations of the underlying graphs. In contrast to these methods in this paper we introduce a novel approach to measure the structural similarity of directed and undirected graphs that is mainly based on margins of feature vectors representing graphs. We introduce novel graph similarity and dissimilarity measures, provide some properties and analyze their algorithmic complexity. We find that the computational complexity of our measures is polynomial in the graph size and, hence, significantly better than classical methods from, e.g. exact graph matching which are NP-complete. Numerically, we provide some examples of our measure and compare the results with the well-known graph edit distance. (c) 2006 Elsevier Inc. All rights reserved.
Resumo:
Quantifying the similarity between two trajectories is a fundamental operation in analysis of spatio-temporal databases. While a number of distance functions exist, the recent shift in the dynamics of the trajectory generation procedure violates one of their core assumptions; a consistent and uniform sampling rate. In this paper, we formulate a robust distance function called Edit Distance with Projections (EDwP) to match trajectories under inconsistent and variable sampling rates through dynamic interpolation. This is achieved by deploying the idea of projections that goes beyond matching only the sampled points while aligning trajectories. To enable efficient trajectory retrievals using EDwP, we design an index structure called TrajTree. TrajTree derives its pruning power by employing the unique combination of bounding boxes with Lipschitz embedding. Extensive experiments on real trajectory databases demonstrate EDwP to be up to 5 times more accurate than the state-of-the-art distance functions. Additionally, TrajTree increases the efficiency of trajectory retrievals by up to an order of magnitude over existing techniques.
Resumo:
We introduce a problem called maximum common characters in blocks (MCCB), which arises in applications of approximate string comparison, particularly in the unification of possibly erroneous textual data coming from different sources. We show that this problem is NP-complete, but can nevertheless be solved satisfactorily using integer linear programming for instances of practical interest. Two integer linear formulations are proposed and compared in terms of their linear relaxations. We also compare the results of the approximate matching with other known measures such as the Levenshtein (edit) distance. (C) 2008 Elsevier B.V. All rights reserved.
Resumo:
XML similarity evaluation has become a central issue in the database and information communities, its applications ranging over document clustering, version control, data integration and ranked retrieval. Various algorithms for comparing hierarchically structured data, XML documents in particular, have been proposed in the literature. Most of them make use of techniques for finding the edit distance between tree structures, XML documents being commonly modeled as Ordered Labeled Trees. Yet, a thorough investigation of current approaches led us to identify several similarity aspects, i.e., sub-tree related structural and semantic similarities, which are not sufficiently addressed while comparing XML documents. In this paper, we provide an integrated and fine-grained comparison framework to deal with both structural and semantic similarities in XML documents (detecting the occurrences and repetitions of structurally and semantically similar sub-trees), and to allow the end-user to adjust the comparison process according to her requirements. Our framework consists of four main modules for (i) discovering the structural commonalities between sub-trees, (ii) identifying sub-tree semantic resemblances, (iii) computing tree-based edit operations costs, and (iv) computing tree edit distance. Experimental results demonstrate higher comparison accuracy with respect to alternative methods, while timing experiments reflect the impact of semantic similarity on overall system performance.
Resumo:
Information Retrieval systems normally have to work with rather heterogeneous sources, such as Web sites or documents from Optical Character Recognition tools. The correct conversion of these sources into flat text files is not a trivial task since noise may easily be introduced as a result of spelling or typeset errors. Interestingly, this is not a great drawback when the size of the corpus is sufficiently large, since redundancy helps to overcome noise problems. However, noise becomes a serious problem in restricted-domain Information Retrieval specially when the corpus is small and has little or no redundancy. This paper devises an approach which adds noise-tolerance to Information Retrieval systems. A set of experiments carried out in the agricultural domain proves the effectiveness of the approach presented.
Resumo:
We analyze an approach to a similarity preserving coding of symbol sequences based on neural distributed representations and show that it can be viewed as a metric embedding process.
Resumo:
This thesis addressed the problem of risk analysis in mental healthcare, with respect to the GRiST project at Aston University. That project provides a risk-screening tool based on the knowledge of 46 experts, captured as mind maps that describe relationships between risks and patterns of behavioural cues. Mind mapping, though, fails to impose control over content, and is not considered to formally represent knowledge. In contrast, this thesis treated GRiSTs mind maps as a rich knowledge base in need of refinement; that process drew on existing techniques for designing databases and knowledge bases. Identifying well-defined mind map concepts, though, was hindered by spelling mistakes, and by ambiguity and lack of coverage in the tools used for researching words. A novel use of the Edit Distance overcame those problems, by assessing similarities between mind map texts, and between spelling mistakes and suggested corrections. That algorithm further identified stems, the shortest text string found in related word-forms. As opposed to existing approaches’ reliance on built-in linguistic knowledge, this thesis devised a novel, more flexible text-based technique. An additional tool, Correspondence Analysis, found patterns in word usage that allowed machines to determine likely intended meanings for ambiguous words. Correspondence Analysis further produced clusters of related concepts, which in turn drove the automatic generation of novel mind maps. Such maps underpinned adjuncts to the mind mapping software used by GRiST; one such new facility generated novel mind maps, to reflect the collected expert knowledge on any specified concept. Mind maps from GRiST are stored as XML, which suggested storing them in an XML database. In fact, the entire approach here is ”XML-centric”, in that all stages rely on XML as far as possible. A XML-based query language allows user to retrieve information from the mind map knowledge base. The approach, it was concluded, will prove valuable to mind mapping in general, and to detecting patterns in any type of digital information.
Resumo:
Bangla OCR (Optical Character Recognition) is a long deserving software for Bengali community all over the world. Numerous e efforts suggest that due to the inherent complex nature of Bangla alphabet and its word formation process development of high fidelity OCR producing a reasonably acceptable output still remains a challenge. One possible way of improvement is by using post processing of OCR’s output; algorithms such as Edit Distance and the use of n-grams statistical information have been used to rectify misspelled words in language processing. This work presents the first known approach to use these algorithms to replace misrecognized words produced by Bangla OCR. The assessment is made on a set of fifty documents written in Bangla script and uses a dictionary of 541,167 words. The proposed correction model can correct several words lowering the recognition error rate by 2.87% and 3.18% for the character based n- gram and edit distance algorithms respectively. The developed system suggests a list of 5 (five) alternatives for a misspelled word. It is found that in 33.82% cases, the correct word is the topmost suggestion of 5 words list for n-gram algorithm while using Edit distance algorithm the first word in the suggestion properly matches 36.31% of the cases. This work will ignite rooms of thoughts for possible improvements in character recognition endeavour.
Resumo:
The Australian Research Collaboration Service (ARCS) has been supporting a wide range of Collaboration Services and Tools which have been allowing researchers, groups and research communities to share ideas and collaborate across organisational boundaries.----- This talk will give an introduction to a number of exciting technologies which are now available. Focus will be on two main areas of Video Collaboration Tools, allowing researchers to talk face-to-face and share data in real-time, and Web Collaboration Tools, allowing researchers to share information and ideas with other like-minded researchers irrespective of distance or organisational structure. A number of examples will also be shown of how these technologies have been used with in various research communities.----- A brief introduction will be given to a number of services which ARCS is now operating and/or supporting such as:--- * EVO – A video conferencing application, which is particularly suited to desktop or low bandwidth applications.--- * AccessGrid – An open source video conferencing and collaboration tool kit, which is great for room to room meetings.--- * Sakai – An online collaboration and learning environment, support teaching and learning, ad hoc group collaboration, support for portfolios and research collaboration.--- * Plone – A ready-to-run content management system, that provides you with a system for managing web content that is ideal for project groups, communities, web sites, extranets and intranets.--- * Wikis – A way to easily create, edit, and link pages together, to create collaborative websites.
Resumo:
Über die letzten Jahre hat sich einige öffentliche und kommerzielle Aufmerksamkeit auf ein Phänomen gerichtet, das sich anschickt, die Medienlandschaft grundlegend zu verändern. Yahoo! kaufte Flickr. Google erwarb YouTube. Rupert Murdoch kaufte MySpace, und erklärte, die Zukunft seines NewsCorp-Imperiums läge eher in der nutzergesteuerten Inhaltserschaffung innerhalb solcher sozialer Medien als in seinen vielen Zeitungen, Fernsehsendern und anderen Medieninteressen (2005). Schließlich brach TIME mit seiner langetablierten Tradition, eine herausragende Persönlichkeit als „Person des Jahres“ zu nominieren, und wählte stattdessen „You“: uns alle, die wir online in Kollaboration Inhalte schaffen (2006). Allerdings liegt die Bedeutung dieses nutzergesteuerten Phänomens nicht in solchen (letztlich unwichtigen) Ehrungen, oder auch nur in den Inhalten zentraler Websites wie YouTube und Flickr – vielmehr findet man sie in logischer Folge der ihr zugrunde liegenden Prinzipien (die wir hier weiter untersuchen werden) viel flächendeckender über das World Wide Web verbreitet; was wichtig ist am neuen Phänomen ist nicht nur der Erfolg seiner sichtbarsten Exponenten, sondern auch der „Long Tail“ (Anderson 2006) der vielen anderen nutzergesteuerten Projekte, die sich überall in der Online-Welt etabliert haben und jetzt beginnen, sich sogar in die Offline-Welt hinein auszubreiten.
Resumo:
Two different methods to measure binocular longitudinal corneal apex movements were synchronously applied. High-speed videokeratoscopy at a sampling frequency of 15 Hz and a customdesigned ultrasound distance sensor at 100 Hz were used for the left and the right eye, respectively. Four healthy subjects participated in the study. Simultaneously, cardiac electric cycle (ECG) was registered for each subject at 100 Hz. Each measurement took 20 s. Subjects were asked to suppress blinking during the measurements. A rigid headrest and a bite-bar were used to minimize undesirable head movements. Time, frequency and time-frequency representations of the acquired signals were obtained to establish their temporal and spectral contents. Coherence analysis was used to estimate the correlation between the measured signals. The results showed close correlation between both corneal apex movements and the cardiopulmonary system. Unraveling these relationships could lead to better understanding of interactions between ocular biomechanics and vision. The advantages and disadvantages of the two methods in the context of measuring longitudinal movements of the corneal apex are outlined.