908 resultados para Metric Embeddings


Relevância:

60.00% 60.00%

Publicador:

Resumo:

We analyze an approach to a similarity preserving coding of symbol sequences based on neural distributed representations and show that it can be viewed as a metric embedding process.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We discuss several approaches to similarity preserving coding of symbol sequences and possible connections of their distributed versions to metric embeddings. Interpreting sequence representation methods with embeddings can help develop an approach to their analysis and may lead to discovering useful properties.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

BoostMap is a recently proposed method for efficient approximate nearest neighbor retrieval in arbitrary non-Euclidean spaces with computationally expensive and possibly non-metric distance measures. Database and query objects are embedded into a Euclidean space, in which similarities can be rapidly measured using a weighted Manhattan distance. The key idea is formulating embedding construction as a machine learning task, where AdaBoost is used to combine simple, 1D embeddings into a multidimensional embedding that preserves a large amount of the proximity structure of the original space. This paper demonstrates that, using the machine learning formulation of BoostMap, we can optimize embeddings for indexing and classification, in ways that are not possible with existing alternatives for constructive embeddings, and without additional costs in retrieval time. First, we show how to construct embeddings that are query-sensitive, in the sense that they yield a different distance measure for different queries, so as to improve nearest neighbor retrieval accuracy for each query. Second, we show how to optimize embeddings for nearest neighbor classification tasks, by tuning them to approximate a parameter space distance measure, instead of the original feature-based distance measure.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Nearest neighbor retrieval is the task of identifying, given a database of objects and a query object, the objects in the database that are the most similar to the query. Retrieving nearest neighbors is a necessary component of many practical applications, in fields as diverse as computer vision, pattern recognition, multimedia databases, bioinformatics, and computer networks. At the same time, finding nearest neighbors accurately and efficiently can be challenging, especially when the database contains a large number of objects, and when the underlying distance measure is computationally expensive. This thesis proposes new methods for improving the efficiency and accuracy of nearest neighbor retrieval and classification in spaces with computationally expensive distance measures. The proposed methods are domain-independent, and can be applied in arbitrary spaces, including non-Euclidean and non-metric spaces. In this thesis particular emphasis is given to computer vision applications related to object and shape recognition, where expensive non-Euclidean distance measures are often needed to achieve high accuracy. The first contribution of this thesis is the BoostMap algorithm for embedding arbitrary spaces into a vector space with a computationally efficient distance measure. Using this approach, an approximate set of nearest neighbors can be retrieved efficiently - often orders of magnitude faster than retrieval using the exact distance measure in the original space. The BoostMap algorithm has two key distinguishing features with respect to existing embedding methods. First, embedding construction explicitly maximizes the amount of nearest neighbor information preserved by the embedding. Second, embedding construction is treated as a machine learning problem, in contrast to existing methods that are based on geometric considerations. The second contribution is a method for constructing query-sensitive distance measures for the purposes of nearest neighbor retrieval and classification. In high-dimensional spaces, query-sensitive distance measures allow for automatic selection of the dimensions that are the most informative for each specific query object. It is shown theoretically and experimentally that query-sensitivity increases the modeling power of embeddings, allowing embeddings to capture a larger amount of the nearest neighbor structure of the original space. The third contribution is a method for speeding up nearest neighbor classification by combining multiple embedding-based nearest neighbor classifiers in a cascade. In a cascade, computationally efficient classifiers are used to quickly classify easy cases, and classifiers that are more computationally expensive and also more accurate are only applied to objects that are harder to classify. An interesting property of the proposed cascade method is that, under certain conditions, classification time actually decreases as the size of the database increases, a behavior that is in stark contrast to the behavior of typical nearest neighbor classification systems. The proposed methods are evaluated experimentally in several different applications: hand shape recognition, off-line character recognition, online character recognition, and efficient retrieval of time series. In all datasets, the proposed methods lead to significant improvements in accuracy and efficiency compared to existing state-of-the-art methods. In some datasets, the general-purpose methods introduced in this thesis even outperform domain-specific methods that have been custom-designed for such datasets.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Dynamic load sharing can be defined as a measure of the ability of a heavy vehicle multi-axle group to equalise load across its wheels under typical travel conditions; i.e. in the dynamic sense at typical travel speeds and operating conditions of that vehicle. Various attempts have been made to quantify the ability of heavy vehicles to equalise the load across their wheels during travel. One of these was the concept of the load sharing coefficient (LSC). Other metrics such as the dynamic load coefficient (DLC) have been used to compare one heavy vehicle suspension with another for potential road damage. This paper compares these metrics and determines a relationship between DLC and LSC with sensitivity analysis of this relationship. The shortcomings of these presently-available metrics are discussed with a new metric proposed - the dynamic load equalisation (DLE) measure.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents a general, global approach to the problem of robot exploration, utilizing a topological data structure to guide an underlying Simultaneous Localization and Mapping (SLAM) process. A Gap Navigation Tree (GNT) is used to motivate global target selection and occluded regions of the environment (called “gaps”) are tracked probabilistically. The process of map construction and the motion of the vehicle alters both the shape and location of these regions. The use of online mapping is shown to reduce the difficulties in implementing the GNT.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Effective enterprise information security policy management requires review and assessment activities to ensure information security policies are aligned with business goals and objectives. As security policy management involves the elements of policy development process and the security policy as output, the context for security policy assessment requires goal-based metrics for these two elements. However, the current security management assessment methods only provide checklist types of assessment that are predefined by industry best practices and do not allow for developing specific goal-based metrics. Utilizing theories drawn from literature, this paper proposes the Enterprise Information Security Policy Assessment approach that expands on the Goal-Question-Metric (GQM) approach. The proposed assessment approach is then applied in a case scenario example to illustrate a practical application. It is shown that the proposed framework addresses the requirement for developing assessment metrics and allows for the concurrent undertaking of process-based and product-based assessment. Recommendations for further research activities include the conduct of empirical research to validate the propositions and the practical application of the proposed assessment approach in case studies to provide opportunities to introduce further enhancements to the approach.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A theoretical basis is required for comparing key features and critical elements in wild fisheries and aquaculture supply chains under a changing climate. Here we develop a new quantitative metric that is analogous to indices used to analyse food-webs and identify key species. The Supply Chain Index (SCI) identifies critical elements as those elements with large throughput rates, as well as greater connectivity. The sum of the scores for a supply chain provides a single metric that roughly captures both the resilience and connectedness of a supply chain. Standardised scores can facilitate cross-comparisons both under current conditions as well as under a changing climate. Identification of key elements along the supply chain may assist in informing adaptation strategies to reduce anticipated future risks posed by climate change. The SCI also provides information on the relative stability of different supply chains based on whether there is a fairly even spread in the individual scores of the top few key elements, compared with a more critical dependence on a few key individual supply chain elements. We use as a case study the Australian southern rock lobster Jasus edwardsii fishery, which is challenged by a number of climate change drivers such as impacts on recruitment and growth due to changes in large-scale and local oceanographic features. The SCI identifies airports, processors and Chinese consumers as the key elements in the lobster supply chain that merit attention to enhance stability and potentially enable growth. We also apply the index to an additional four real-world Australian commercial fishery and two aquaculture industry supply chains to highlight the utility of a systematic method for describing supply chains. Overall, our simple methodological approach to empirically-based supply chain research provides an objective method for comparing the resilience of supply chains and highlighting components that may be critical.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This study investigates the use of unsupervised features derived from word embedding approaches and novel sequence representation approaches for improving clinical information extraction systems. Our results corroborate previous findings that indicate that the use of word embeddings significantly improve the effectiveness of concept extraction models; however, we further determine the influence that the corpora used to generate such features have. We also demonstrate the promise of sequence-based unsupervised features for further improving concept extraction.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Recent advances in neural language models have contributed new methods for learning distributed vector representations of words (also called word embeddings). Two such methods are the continuous bag-of-words model and the skipgram model. These methods have been shown to produce embeddings that capture higher order relationships between words that are highly effective in natural language processing tasks involving the use of word similarity and word analogy. Despite these promising results, there has been little analysis of the use of these word embeddings for retrieval. Motivated by these observations, in this paper, we set out to determine how these word embeddings can be used within a retrieval model and what the benefit might be. To this aim, we use neural word embeddings within the well known translation language model for information retrieval. This language model captures implicit semantic relations between the words in queries and those in relevant documents, thus producing more accurate estimations of document relevance. The word embeddings used to estimate neural language models produce translations that differ from previous translation language model approaches; differences that deliver improvements in retrieval effectiveness. The models are robust to choices made in building word embeddings and, even more so, our results show that embeddings do not even need to be produced from the same corpus being used for retrieval.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This thesis studies homogeneous classes of complete metric spaces. Over the past few decades model theory has been extended to cover a variety of nonelementary frameworks. Shelah introduced the abstact elementary classes (AEC) in the 1980s as a common framework for the study of nonelementary classes. Another direction of extension has been the development of model theory for metric structures. This thesis takes a step in the direction of combining these two by introducing an AEC-like setting for studying metric structures. To find balance between generality and the possibility to develop stability theoretic tools, we work in a homogeneous context, thus extending the usual compact approach. The homogeneous context enables the application of stability theoretic tools developed in discrete homogeneous model theory. Using these we prove categoricity transfer theorems for homogeneous metric structures with respect to isometric isomorphisms. We also show how generalized isomorphisms can be added to the class, giving a model theoretic approach to, e.g., Banach space isomorphisms or operator approximations. The novelty is the built-in treatment of these generalized isomorphisms making, e.g., stability up to perturbation the natural stability notion. With respect to these generalized isomorphisms we develop a notion of independence. It behaves well already for structures which are omega-stable up to perturbation and coincides with the one from classical homogeneous model theory over saturated enough models. We also introduce a notion of isolation and prove dominance for it.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Various Tb theorems play a key role in the modern harmonic analysis. They provide characterizations for the boundedness of Calderón-Zygmund type singular integral operators. The general philosophy is that to conclude the boundedness of an operator T on some function space, one needs only to test it on some suitable function b. The main object of this dissertation is to prove very general Tb theorems. The dissertation consists of four research articles and an introductory part. The framework is general with respect to the domain (a metric space), the measure (an upper doubling measure) and the range (a UMD Banach space). Moreover, the used testing conditions are weak. In the first article a (global) Tb theorem on non-homogeneous metric spaces is proved. One of the main technical components is the construction of a randomization procedure for the metric dyadic cubes. The difficulty lies in the fact that metric spaces do not, in general, have a translation group. Also, the measures considered are more general than in the existing literature. This generality is genuinely important for some applications, including the result of Volberg and Wick concerning the characterization of measures for which the analytic Besov-Sobolev space embeds continuously into the space of square integrable functions. In the second article a vector-valued extension of the main result of the first article is considered. This theorem is a new contribution to the vector-valued literature, since previously such general domains and measures were not allowed. The third article deals with local Tb theorems both in the homogeneous and non-homogeneous situations. A modified version of the general non-homogeneous proof technique of Nazarov, Treil and Volberg is extended to cover the case of upper doubling measures. This technique is also used in the homogeneous setting to prove local Tb theorems with weak testing conditions introduced by Auscher, Hofmann, Muscalu, Tao and Thiele. This gives a completely new and direct proof of such results utilizing the full force of non-homogeneous analysis. The final article has to do with sharp weighted theory for maximal truncations of Calderón-Zygmund operators. This includes a reduction to certain Sawyer-type testing conditions, which are in the spirit of Tb theorems and thus of the dissertation. The article extends the sharp bounds previously known only for untruncated operators, and also proves sharp weak type results, which are new even for untruncated operators. New techniques are introduced to overcome the difficulties introduced by the non-linearity of maximal truncations.