986 resultados para distance measures


Relevância:

60.00% 60.00%

Publicador:

Resumo:

Design of speaker identification schemes for a small number of speakers (around 10) with a high degree of accuracy in controlled environment is a practical proposition today. When the number of speakers is large (say 50–100), many of these schemes cannot be directly extended, as both recognition error and computation time increase monotonically with population size. The feature selection problem is also complex for such schemes. Though there were earlier attempts to rank order features based on statistical distance measures, it has been observed only recently that the best two independent measurements are not the same as the combination in two's for pattern classification. We propose here a systematic approach to the problem using the decision tree or hierarchical classifier with the following objectives: (1) Design of optimal policy at each node of the tree given the tree structure i.e., the tree skeleton and the features to be used at each node. (2) Determination of the optimal feature measurement and decision policy given only the tree skeleton. Applicability of optimization procedures such as dynamic programming in the design of such trees is studied. The experimental results deal with the design of a 50 speaker identification scheme based on this approach.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The problem of characterizing global sensitivity indices of structural response when system uncertainties are represented using probabilistic and (or) non-probabilistic modeling frameworks (which include intervals, convex functions, and fuzzy variables) is considered. These indices are characterized in terms of distance measures between a fiducial model in which uncertainties in all the pertinent variables are taken into account and a family of hypothetical models in which uncertainty in one or more selected variables are suppressed. The distance measures considered include various probability distance measures (Hellinger,l(2), and the Kantorovich metrics, and the Kullback-Leibler divergence) and Hausdorff distance measure as applied to intervals and fuzzy variables. Illustrations include studies on an uncertainly parametered building frame carrying uncertain loads. (C) 2015 Elsevier Ltd. All rights reserved.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Many transductive inference algorithms assume that distributions over training and test estimates should be related, e.g. by providing a large margin of separation on both sets. We use this idea to design a transduction algorithm which can be used without modification for classification, regression, and structured estimation. At its heart we exploit the fact that for a good learner the distributions over the outputs on training and test sets should match. This is a classical two-sample problem which can be solved efficiently in its most general form by using distance measures in Hilbert Space. It turns out that a number of existing heuristics can be viewed as special cases of our approach.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper introduces BoostMap, a method that can significantly reduce retrieval time in image and video database systems that employ computationally expensive distance measures, metric or non-metric. Database and query objects are embedded into a Euclidean space, in which similarities can be rapidly measured using a weighted Manhattan distance. Embedding construction is formulated as a machine learning task, where AdaBoost is used to combine many simple, 1D embeddings into a multidimensional embedding that preserves a significant amount of the proximity structure in the original space. Performance is evaluated in a hand pose estimation system, and a dynamic gesture recognition system, where the proposed method is used to retrieve approximate nearest neighbors under expensive image and video similarity measures. In both systems, BoostMap significantly increases efficiency, with minimal losses in accuracy. Moreover, the experiments indicate that BoostMap compares favorably with existing embedding methods that have been employed in computer vision and database applications, i.e., FastMap and Bourgain embeddings.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper introduces an algorithm that uses boosting to learn a distance measure for multiclass k-nearest neighbor classification. Given a family of distance measures as input, AdaBoost is used to learn a weighted distance measure, that is a linear combination of the input measures. The proposed method can be seen both as a novel way to learn a distance measure from data, and as a novel way to apply boosting to multiclass recognition problems, that does not require output codes. In our approach, multiclass recognition of objects is reduced into a single binary recognition task, defined on triples of objects. Preliminary experiments with eight UCI datasets yield no clear winner among our method, boosting using output codes, and k-nn classification using an unoptimized distance measure. Our algorithm did achieve lower error rates in some of the datasets, which indicates that, in some domains, it may lead to better results than existing methods.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

BoostMap is a recently proposed method for efficient approximate nearest neighbor retrieval in arbitrary non-Euclidean spaces with computationally expensive and possibly non-metric distance measures. Database and query objects are embedded into a Euclidean space, in which similarities can be rapidly measured using a weighted Manhattan distance. The key idea is formulating embedding construction as a machine learning task, where AdaBoost is used to combine simple, 1D embeddings into a multidimensional embedding that preserves a large amount of the proximity structure of the original space. This paper demonstrates that, using the machine learning formulation of BoostMap, we can optimize embeddings for indexing and classification, in ways that are not possible with existing alternatives for constructive embeddings, and without additional costs in retrieval time. First, we show how to construct embeddings that are query-sensitive, in the sense that they yield a different distance measure for different queries, so as to improve nearest neighbor retrieval accuracy for each query. Second, we show how to optimize embeddings for nearest neighbor classification tasks, by tuning them to approximate a parameter space distance measure, instead of the original feature-based distance measure.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

A common problem in many types of databases is retrieving the most similar matches to a query object. Finding those matches in a large database can be too slow to be practical, especially in domains where objects are compared using computationally expensive similarity (or distance) measures. This paper proposes a novel method for approximate nearest neighbor retrieval in such spaces. Our method is embedding-based, meaning that it constructs a function that maps objects into a real vector space. The mapping preserves a large amount of the proximity structure of the original space, and it can be used to rapidly obtain a short list of likely matches to the query. The main novelty of our method is that it constructs, together with the embedding, a query-sensitive distance measure that should be used when measuring distances in the vector space. The term "query-sensitive" means that the distance measure changes depending on the current query object. We report experiments with an image database of handwritten digits, and a time-series database. In both cases, the proposed method outperforms existing state-of-the-art embedding methods, meaning that it provides significantly better trade-offs between efficiency and retrieval accuracy.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Nearest neighbor retrieval is the task of identifying, given a database of objects and a query object, the objects in the database that are the most similar to the query. Retrieving nearest neighbors is a necessary component of many practical applications, in fields as diverse as computer vision, pattern recognition, multimedia databases, bioinformatics, and computer networks. At the same time, finding nearest neighbors accurately and efficiently can be challenging, especially when the database contains a large number of objects, and when the underlying distance measure is computationally expensive. This thesis proposes new methods for improving the efficiency and accuracy of nearest neighbor retrieval and classification in spaces with computationally expensive distance measures. The proposed methods are domain-independent, and can be applied in arbitrary spaces, including non-Euclidean and non-metric spaces. In this thesis particular emphasis is given to computer vision applications related to object and shape recognition, where expensive non-Euclidean distance measures are often needed to achieve high accuracy. The first contribution of this thesis is the BoostMap algorithm for embedding arbitrary spaces into a vector space with a computationally efficient distance measure. Using this approach, an approximate set of nearest neighbors can be retrieved efficiently - often orders of magnitude faster than retrieval using the exact distance measure in the original space. The BoostMap algorithm has two key distinguishing features with respect to existing embedding methods. First, embedding construction explicitly maximizes the amount of nearest neighbor information preserved by the embedding. Second, embedding construction is treated as a machine learning problem, in contrast to existing methods that are based on geometric considerations. The second contribution is a method for constructing query-sensitive distance measures for the purposes of nearest neighbor retrieval and classification. In high-dimensional spaces, query-sensitive distance measures allow for automatic selection of the dimensions that are the most informative for each specific query object. It is shown theoretically and experimentally that query-sensitivity increases the modeling power of embeddings, allowing embeddings to capture a larger amount of the nearest neighbor structure of the original space. The third contribution is a method for speeding up nearest neighbor classification by combining multiple embedding-based nearest neighbor classifiers in a cascade. In a cascade, computationally efficient classifiers are used to quickly classify easy cases, and classifiers that are more computationally expensive and also more accurate are only applied to objects that are harder to classify. An interesting property of the proposed cascade method is that, under certain conditions, classification time actually decreases as the size of the database increases, a behavior that is in stark contrast to the behavior of typical nearest neighbor classification systems. The proposed methods are evaluated experimentally in several different applications: hand shape recognition, off-line character recognition, online character recognition, and efficient retrieval of time series. In all datasets, the proposed methods lead to significant improvements in accuracy and efficiency compared to existing state-of-the-art methods. In some datasets, the general-purpose methods introduced in this thesis even outperform domain-specific methods that have been custom-designed for such datasets.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The notion of diversity is an issue that is of relevance in several contexts. For example, the biodiversity of a given ecological environment and the diversity of the options available to a decision maker have attracted some attention in recent research. This paper provides an axiomatic approach to the measurement of diversity. We characterize two nested classes of ordinal measures of diversity and an important member of these classes. We prove that the latter special case is equivalent to a diversity ordering proposed by Weitzman.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

An ongoing controversy in Amazonian palaeoecology is the manner in which Amazonian rainforest communities have responded to environmental change over the last glacial–interglacial cycle. Much of this controversy results from an inability to identify the floristic heterogeneity exhibited by rainforest communities within fossil pollen records. We apply multivariate (Principal Components Analysis) and classification (Unweighted Pair Group with Arithmetic Mean Agglomerative Classification) techniques to floral-biometric, modern pollen trap and lake sediment pollen data situated within different rainforest communities in the tropical lowlands of Amazonian Bolivia. Modern pollen rain analyses from artificial pollen traps show that evergreen terra firme (well-drained), evergreen terra firme liana, evergreen seasonally inundated, and evergreen riparian rainforests may be readily differentiated, floristically and palynologically. Analogue matching techniques, based on Euclidean distance measures, are employed to compare these pollen signatures with surface sediment pollen assemblages from five lakes: Laguna Bella Vista, Laguna Chaplin, and Laguna Huachi situated within the Madeira-Tapajós moist forest ecoregion, and Laguna Isirere and Laguna Loma Suarez, which are situated within forest patches in the Beni savanna ecoregion. The same numerical techniques are used to compare rainforest pollen trap signatures with the fossil pollen record of Laguna Chaplin.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The problems of finding best facility locations require complete and accurate road network with the corresponding population data in a specific area. However the data obtained in road network databases usually do not fit in this usage. In this paper we propose our procedure of converting the road network database to a road graph which could be used in localization problems. The road network data come from the National road data base in Sweden. The graph derived is cleaned, and reduced to a suitable level for localization problems. The population points are also processed in ordered to match with that graph. The reduction of the graph is done maintaining most of the accuracy for distance measures in the network.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The purpose of this paper was to evaluate attributes derived from fully polarimetric PALSAR data to discriminate and map macrophyte species in the Amazon floodplain wetlands. Fieldwork was carried out almost simultaneously to the radar acquisition, and macrophyte biomass and morphological variables were measured in the field. Attributes were calculated from the covariance matrix [C] derived from the single-look complex data. Image attributes and macrophyte variables were compared and analyzed to investigate the sensitivity of the attributes for discriminating among species. Based on these analyses, a rule-based classification was applied to map macrophyte species. Other classification approaches were tested and compared to the rule-based method: a classification based on the Freeman-Durden and Cloude-Pottier decomposition models, a hybrid classification (Wishart classifier with the input classes based on the H/a plane), and a statistical-based classification (supervised classification using Wishart distance measures). The findings show that attributes derived from fully polarimetric L-band data have good potential for discriminating herbaceous plant species based on morphology and that estimation of plant biomass and productivity could be improved by using these polarimetric attributes.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)

Relevância:

60.00% 60.00%

Publicador:

Resumo:

A high rate of root exposure and consequently the exposure of the furcation area is usually observed in multirooted teeth. In maxillary molar teeth, this fact may endanger the three existent furcations (buccal, mesial and distal), causing serious problems. In this research, distance measures from the buccal furcation to the mesial (F1M) and distal (F1D) surfaces of the mesio-buccal and disto-buccal roots; from the mesial furcation to the buccal (F2B) and palatal (F2P) surfaces of the mesio-buccal and palatal roots and from the distal furcation to the buccal (F3B) and palatal (F3P) surfaces of the disto-buccal and palatal roots, respectively were established. One hundred maxillary first molar teeth were used, 50 of the right and 50 of the left side. Reference marks and demarcations were determined on the furcations and also on the root surfaces involved in the measures. We concluded that these measurements are important because they may effectivelly contribute to diagnosis, prevention and treatment of periodontal problems.