49 resultados para Multi-scale place recognition
Resumo:
This paper presents a novel coarse-to-fine global localization approach inspired by object recognition and text retrieval techniques. Harris-Laplace interest points characterized by scale-invariant transformation feature descriptors are used as natural landmarks. They are indexed into two databases: a location vector space model (LVSM) and a location database. The localization process consists of two stages: coarse localization and fine localization. Coarse localization from the LVSM is fast, but not accurate enough, whereas localization from the location database using a voting algorithm is relatively slow, but more accurate. The integration of coarse and fine stages makes fast and reliable localization possible. If necessary, the localization result can be verified by epipolar geometry between the representative view in the database and the view to be localized. In addition, the localization system recovers the position of the camera by essential matrix decomposition. The localization system has been tested in indoor and outdoor environments. The results show that our approach is efficient and reliable. © 2006 IEEE.
Resumo:
In this paper, we propose a vision based mobile robot localization strategy. Local scale-invariant features are used as natural landmarks in unstructured and unmodified environment. The local characteristics of the features we use prove to be robust to occlusion and outliers. In addition, the invariance of the features to viewpoint change makes them suitable landmarks for mobile robot localization. Scale-invariant features detected in the first exploration are indexed into a location database. Indexing and voting allow efficient recognition of global localization. The localization result is verified by epipolar geometry between the representative view in database and the view to be localized, thus the probability of false localization will be decreased. The localization system can recover the pose of the camera mounted on the robot by essential matrix decomposition. Then the position of the robot can be computed easily. Both calibrated and un-calibrated cases are discussed and relative position estimation based on calibrated camera turns out to be the better choice. Experimental results show that our approach is effective and reliable in the case of illumination changes, similarity transformations and extraneous features. © 2004 IEEE.
Resumo:
This paper describes results obtained using the modified Kanerva model to perform word recognition in continuous speech after being trained on the multi-speaker Alvey 'Hotel' speech corpus. Theoretical discoveries have recently enabled us to increase the speed of execution of part of the model by two orders of magnitude over that previously reported by Prager & Fallside. The memory required for the operation of the model has been similarly reduced. The recognition accuracy reaches 95% without syntactic constraints when tested on different data from seven trained speakers. Real time simulation of a model with 9,734 active units is now possible in both training and recognition modes using the Alvey PARSIFAL transputer array. The modified Kanerva model is a static network consisting of a fixed nonlinear mapping (location matching) followed by a single layer of conventional adaptive links. A section of preprocessed speech is transformed by the non-linear mapping to a high dimensional representation. From this intermediate representation a simple linear mapping is able to perform complex pattern discrimination to form the output, indicating the nature of the speech features present in the input window.
Resumo:
Over recent years we have developed and published research aimed at producing a meshing, geometry editing and simulation system capable of handling large scale, real world applications and implemented in an end-to-end parallel, scalable manner. The particular focus of this paper is the extension of this meshing system to include conjugate meshes for multi-physics simulations. Two contrasting applications are presented: export of a body-conformal mesh to drive a commercial, third-party simulation system; and direct use of the cut-Cartesian octree mesh with a single, integrated, close-coupled multi-physics simulation system. Copyright © 2010 by W.N.Dawes.
Resumo:
We present a novel, implementation friendly and occlusion aware semi-supervised video segmentation algorithm using tree structured graphical models, which delivers pixel labels alongwith their uncertainty estimates. Our motivation to employ supervision is to tackle a task-specific segmentation problem where the semantic objects are pre-defined by the user. The video model we propose for this problem is based on a tree structured approximation of a patch based undirected mixture model, which includes a novel time-series and a soft label Random Forest classifier participating in a feedback mechanism. We demonstrate the efficacy of our model in cutting out foreground objects and multi-class segmentation problems in lengthy and complex road scene sequences. Our results have wide applicability, including harvesting labelled video data for training discriminative models, shape/pose/articulation learning and large scale statistical analysis to develop priors for video segmentation. © 2011 IEEE.
Resumo:
This paper introduces the Interlevel Product (ILP) which is a transform based upon the Dual-Tree Complex Wavelet. Coefficients of the ILP have complex values whose magnitudes indicate the amplitude of multilevel features, and whose phases indicate the nature of these features (e.g. ridges vs. edges). In particular, the phases of ILP coefficients are approximately invariant to small shifts in the original images. We accordingly introduce this transform as a solution to coarse scale template matching, where alignment concerns between decimation of a target and decimation of a larger search image can be mitigated, and computational efficiency can be maintained. Furthermore, template matching with ILP coefficients can provide several intuitive "near-matches" that may be of interest in image retrieval applications. © 2005 IEEE.
Resumo:
In this paper, a novel cortex-inspired feed-forward hierarchical object recognition system based on complex wavelets is proposed and tested. Complex wavelets contain three key properties for object representation: shift invariance, which enables the extraction of stable local features; good directional selectivity, which simplifies the determination of image orientations; and limited redundancy, which allows for efficient signal analysis using the multi-resolution decomposition offered by complex wavelets. In this paper, we propose a complete cortex-inspired object recognition system based on complex wavelets. We find that the implementation of the HMAX model for object recognition in [1, 2] is rather over-complete and includes too much redundant information and processing. We have optimized the structure of the model to make it more efficient. Specifically, we have used the Caltech 5 standard dataset to compare with Serre's model in [2] (which employs Gabor filter bands). Results demonstrate that the complex wavelet model achieves a speed improvement of about 4 times over the Serre model and gives comparable recognition performance. © 2011 IEEE.
Resumo:
Mandarin Chinese is based on characters which are syllabic in nature and morphological in meaning. All spoken languages have syllabiotactic rules which govern the construction of syllables and their allowed sequences. These constraints are not as restrictive as those learned from word sequences, but they can provide additional useful linguistic information. Hence, it is possible to improve speech recognition performance by appropriately combining these two types of constraints. For the Chinese language considered in this paper, character level language models (LMs) can be used as a first level approximation to allowed syllable sequences. To test this idea, word and character level n-gram LMs were trained on 2.8 billion words (equivalent to 4.3 billion characters) of texts from a wide collection of text sources. Both hypothesis and model based combination techniques were investigated to combine word and character level LMs. Significant character error rate reductions up to 7.3% relative were obtained on a state-of-the-art Mandarin Chinese broadcast audio recognition task using an adapted history dependent multi-level LM that performs a log-linearly combination of character and word level LMs. This supports the hypothesis that character or syllable sequence models are useful for improving Mandarin speech recognition performance.
Resumo:
In the modern and dynamic construction environment it is important to access information in a fast and efficient manner in order to improve the decision making processes for construction managers. This capability is, in most cases, straightforward with today’s technologies for data types with an inherent structure that resides primarily on established database structures like estimating and scheduling software. However, previous research has demonstrated that a significant percentage of construction data is stored in semi-structured or unstructured data formats (text, images, etc.) and that manually locating and identifying such data is a very hard and time-consuming task. This paper focuses on construction site image data and presents a novel image retrieval model that interfaces with established construction data management structures. This model is designed to retrieve images from related objects in project models or construction databases using location, date, and material information (extracted from the image content with pattern recognition techniques).
Resumo:
Amplitude demodulation is an ill-posed problem and so it is natural to treat it from a Bayesian viewpoint, inferring the most likely carrier and envelope under probabilistic constraints. One such treatment is Probabilistic Amplitude Demodulation (PAD), which, whilst computationally more intensive than traditional approaches, offers several advantages. Here we provide methods for estimating the uncertainty in the PAD-derived envelopes and carriers, and for learning free-parameters like the time-scale of the envelope. We show how the probabilistic approach can naturally handle noisy and missing data. Finally, we indicate how to extend the model to signals which contain multiple modulators and carriers.