999 resultados para image indexing


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Rapid judgments about the properties and spatial relations of objects are the crux of visually guided interaction with the world. Vision begins, however, with essentially pointwise representations of the scene, such as arrays of pixels or small edge fragments. For adequate time-performance in recognition, manipulation, navigation, and reasoning, the processes that extract meaningful entities from the pointwise representations must exploit parallelism. This report develops a framework for the fast extraction of scene entities, based on a simple, local model of parallel computation.sAn image chunk is a subset of an image that can act as a unit in the course of spatial analysis. A parallel preprocessing stage constructs a variety of simple chunks uniformly over the visual array. On the basis of these chunks, subsequent serial processes locate relevant scene components and assemble detailed descriptions of them rapidly. This thesis defines image chunks that facilitate the most potentially time-consuming operations of spatial analysis---boundary tracing, area coloring, and the selection of locations at which to apply detailed analysis. Fast parallel processes for computing these chunks from images, and chunk-based formulations of indexing, tracing, and coloring, are presented. These processes have been simulated and evaluated on the lisp machine and the connection machine.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A method is proposed that can generate a ranked list of plausible three-dimensional hand configurations that best match an input image. Hand pose estimation is formulated as an image database indexing problem, where the closest matches for an input hand image are retrieved from a large database of synthetic hand images. In contrast to previous approaches, the system can function in the presence of clutter, thanks to two novel clutter-tolerant indexing methods. First, a computationally efficient approximation of the image-to-model chamfer distance is obtained by embedding binary edge images into a high-dimensional Euclide an space. Second, a general-purpose, probabilistic line matching method identifies those line segment correspondences between model and input images that are the least likely to have occurred by chance. The performance of this clutter-tolerant approach is demonstrated in quantitative experiments with hundreds of real hand images.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Estimation of 3D hand pose is useful in many gesture recognition applications, ranging from human-computer interaction to automated recognition of sign languages. In this paper, 3D hand pose estimation is treated as a database indexing problem. Given an input image of a hand, the most similar images in a large database of hand images are retrieved. The hand pose parameters of the retrieved images are used as estimates for the hand pose in the input image. Lipschitz embeddings of edge images into a Euclidean space are used to improve the efficiency of database retrieval. In order to achieve interactive retrieval times, similarity queries are initially performed in this Euclidean space. The paper describes ongoing work that focuses on how to best choose reference images, in order to improve retrieval accuracy.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We propose the development of a world wide web image search engine that crawls the web collecting information about the images it finds, computes the appropriate image decompositions and indices, and stores this extracted information for searches based on image content. Indexing and searching images need not require solving the image understanding problem. Instead, the general approach should be to provide an arsenal of image decompositions and discriminants that can be precomputed for images. At search time, users can select a weighted subset of these decompositions to be used for computing image similarity measurements. While this approach avoids the search-time-dependent problem of labeling what is important in images, it still holds several important problems that require further research in the area of query by image content. We briefly explore some of these problems as they pertain to shape.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

ImageRover is a search by image content navigation tool for the world wide web. The staggering size of the WWW dictates certain strategies and algorithms for image collection, digestion, indexing, and user interface. This paper describes two key components of the ImageRover strategy: image digestion and relevance feedback. Image digestion occurs during image collection; robots digest the images they find, computing image decompositions and indices, and storing this extracted information in vector form for searches based on image content. Relevance feedback occurs during index search; users can iteratively guide the search through the selection of relevant examples. ImageRover employs a novel relevance feedback algorithm to determine the weighted combination of image similarity metrics appropriate for a particular query. ImageRover is available and running on the web site.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Many real world image analysis problems, such as face recognition and hand pose estimation, involve recognizing a large number of classes of objects or shapes. Large margin methods, such as AdaBoost and Support Vector Machines (SVMs), often provide competitive accuracy rates, but at the cost of evaluating a large number of binary classifiers, thus making it difficult to apply such methods when thousands or millions of classes need to be recognized. This thesis proposes a filter-and-refine framework, whereby, given a test pattern, a small number of candidate classes can be identified efficiently at the filter step, and computationally expensive large margin classifiers are used to evaluate these candidates at the refine step. Two different filtering methods are proposed, ClassMap and OVA-VS (One-vs.-All classification using Vector Search). ClassMap is an embedding-based method, works for both boosted classifiers and SVMs, and tends to map the patterns and their associated classes close to each other in a vector space. OVA-VS maps OVA classifiers and test patterns to vectors based on the weights and outputs of weak classifiers of the boosting scheme. At runtime, finding the strongest-responding OVA classifier becomes a classical vector search problem, where well-known methods can be used to gain efficiency. In our experiments, the proposed methods achieve significant speed-ups, in some cases up to two orders of magnitude, compared to exhaustive evaluation of all OVA classifiers. This was achieved in hand pose recognition and face recognition systems where the number of classes ranges from 535 to 48,600.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Some WWW image engines allow the user to form a query in terms of text keywords. To build the image index, keywords are extracted heuristically from HTML documents containing each image, and/or from the image URL and file headers. Unfortunately, text-based image engines have merely retro-fitted standard SQL database query methods, and it is difficult to include images cues within such a framework. On the other hand, visual statistics (e.g., color histograms) are often insufficient for helping users find desired images in a vast WWW index. By truly unifying textual and visual statistics, one would expect to get better results than either used separately. In this paper, we propose an approach that allows the combination of visual statistics with textual statistics in the vector space representation commonly used in query by image content systems. Text statistics are captured in vector form using latent semantic indexing (LSI). The LSI index for an HTML document is then associated with each of the images contained therein. Visual statistics (e.g., color, orientedness) are also computed for each image. The LSI and visual statistic vectors are then combined into a single index vector that can be used for content-based search of the resulting image database. By using an integrated approach, we are able to take advantage of possible statistical couplings between the topic of the document (latent semantic content) and the contents of images (visual statistics). This allows improved performance in conducting content-based search. This approach has been implemented in a WWW image search engine prototype.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Latent semantic indexing (LSI) is a technique used for intelligent information retrieval (IR). It can be used as an alternative to traditional keyword matching IR and is attractive in this respect because of its ability to overcome problems with synonymy and polysemy. This study investigates various aspects of LSI: the effect of the Haar wavelet transform (HWT) as a preprocessing step for the singular value decomposition (SVD) in the key stage of the LSI process; and the effect of different threshold types in the HWT on the search results. The developed method allows the visualisation and processing of the term document matrix, generated in the LSI process, using HWT. The results have shown that precision can be increased by applying the HWT as a preprocessing step, with better results for hard thresholding than soft thresholding, whereas standard SVD-based LSI remains the most effective way of searching in terms of recall value.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A novel framework for multimodal semantic-associative collateral image labelling, aiming at associating image regions with textual keywords, is described. Both the primary image and collateral textual modalities are exploited in a cooperative and complementary fashion. The collateral content and context based knowledge is used to bias the mapping from the low-level region-based visual primitives to the high-level visual concepts defined in a visual vocabulary. We introduce the notion of collateral context, which is represented as a co-occurrence matrix, of the visual keywords, A collaborative mapping scheme is devised using statistical methods like Gaussian distribution or Euclidean distance together with collateral content and context-driven inference mechanism. Finally, we use Self Organising Maps to examine the classification and retrieval effectiveness of the proposed high-level image feature vector model which is constructed based on the image labelling results.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The utilization of massive multimedia documents collections, such as multimedia documents in the global Internet, needs search engines which can rank using both text and image evidence. Massive size and (dynamic) nature of collection can make manual indexing prohibitively expensive in such situations. Traditional search engines utilize only text components of multimedia documents. But there are information needs, which require the utilization of image evidence. In this paper, we investigate image-feature for large and heterogeneous collections. Both the nature and complexities of information needs are key elements for an effective retrieval. Retrieval needs that depend on perceptual similarities (as found in art galleries, building architecture) require the utilization of visual cues. In such situations, the retrieval of multimedia document based on image ranking can provide higher effectiveness. Experimental results show that effectiveness of ranking based on image feature can be higher where perceptual similarities are key elements for retrieval than the retrieval effectiveness of algorithms based on text ranking algorithms

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Traditional image retrieval systems are content based image retrieval systems which rely on low-level features for indexing and retrieval of images. CBIR systems fail to meet user expectations because of the gap between the low level features used by such systems and the high level perception of images by humans. Semantics based methods have been used to describe images according to their high level features. In this paper, we performed experiments to identify the failure of existing semantics-based methods to retrieve images in a particular semantic category. We have proposed a new semantic category to describe the intra-region color feature. The proposed semantic category complements the existing high level descriptions. Experimental results confirm the effectiveness of the proposed method.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This work combines natural language understanding and image processing with incremental learning to develop a system that can automatically interpret and index American Football. We have developed a model for representing spatio-temporal characteristics of multiple objects in dynamic scenes in this domain. Our representation combines expert knowledge, domain knowledge, spatial knowledge and temporal knowledge. We also present an incremental learning algorithm to improve the knowledge base as well as to keep previously developed concepts consistent with new data. The advantages of the incremental learning algorithm are that is that it does not split concepts and it generates a compact conceptual hierarchy which does not store instances.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This work constitutes the first attempt to extract an important narrative structure, the 3-Act story telling paradigm, in film. This narrative structure is prevalent in the domain of film as it forms the foundation and framework in which the film can be made to function as an effective tool for story telling, and its extraction is a vital step in automatic content management for film data. A novel act boundary likelihood function for Act 1 is derived using a Bayesian formulation under guidance from film grammar, tested under many configurations and the results are reported for experiments involving 25 full length movies. The formulation is shown to be a useful tool in both the automatic and semi-interactive setting for semantic analysis of film.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Composing a multimedia presentation may require creation or generation of suitable images and video segments, as well as animation, sound, or special effects. Obtaining images or video sequences can be prohibitively expensive when costs of travel to location, equipment, staff, etc, are considered. Those problems can be alleviated with the use of pictorial and video digital libraries, such libraries require methods for comprehensive indexing and annotation of stored items and efficient retrieval tools.

We propose a system based on user oriented perceptions as they influence query formation in image and video retrieval. We present a method based on user dependent conceptual structures for creating and maintaining indexes to images and video sequences.