982 resultados para visual object categorization


Relevância:

100.00% 100.00%

Publicador:

Resumo:

The number of digital images has been increasing exponentially in the last few years. People have problems managing their image collections and finding a specific image. An automatic image categorization system could help them to manage images and find specific images. In this thesis, an unsupervised visual object categorization system was implemented to categorize a set of unknown images. The system is unsupervised, and hence, it does not need known images to train the system which needs to be manually obtained. Therefore, the number of possible categories and images can be huge. The system implemented in the thesis extracts local features from the images. These local features are used to build a codebook. The local features and the codebook are then used to generate a feature vector for an image. Images are categorized based on the feature vectors. The system is able to categorize any given set of images based on the visual appearance of the images. Images that have similar image regions are grouped together in the same category. Thus, for example, images which contain cars are assigned to the same cluster. The unsupervised visual object categorization system can be used in many situations, e.g., in an Internet search engine. The system can categorize images for a user, and the user can then easily find a specific type of image.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Local features are used in many computer vision tasks including visual object categorization, content-based image retrieval and object recognition to mention a few. Local features are points, blobs or regions in images that are extracted using a local feature detector. To make use of extracted local features the localized interest points are described using a local feature descriptor. A descriptor histogram vector is a compact representation of an image and can be used for searching and matching images in databases. In this thesis the performance of local feature detectors and descriptors is evaluated for object class detection task. Features are extracted from image samples belonging to several object classes. Matching features are then searched using random image pairs of a same class. The goal of this thesis is to find out what are the best detector and descriptor methods for such task in terms of detector repeatability and descriptor matching rate.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Single-trial encounters with multisensory stimuli affect both memory performance and early-latency brain responses to visual stimuli. Whether and how auditory cortices support memory processes based on single-trial multisensory learning is unknown and may differ qualitatively and quantitatively from comparable processes within visual cortices due to purported differences in memory capacities across the senses. We recorded event-related potentials (ERPs) as healthy adults (n = 18) performed a continuous recognition task in the auditory modality, discriminating initial (new) from repeated (old) sounds of environmental objects. Initial presentations were either unisensory or multisensory; the latter entailed synchronous presentation of a semantically congruent or a meaningless image. Repeated presentations were exclusively auditory, thus differing only according to the context in which the sound was initially encountered. Discrimination abilities (indexed by d') were increased for repeated sounds that were initially encountered with a semantically congruent image versus sounds initially encountered with either a meaningless or no image. Analyses of ERPs within an electrical neuroimaging framework revealed that early stages of auditory processing of repeated sounds were affected by prior single-trial multisensory contexts. These effects followed from significantly reduced activity within a distributed network, including the right superior temporal cortex, suggesting an inverse relationship between brain activity and behavioural outcome on this task. The present findings demonstrate how auditory cortices contribute to long-term effects of multisensory experiences on auditory object discrimination. We propose a new framework for the efficacy of multisensory processes to impact both current multisensory stimulus processing and unisensory discrimination abilities later in time.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Multisensory memory traces established via single-trial exposures can impact subsequent visual object recognition. This impact appears to depend on the meaningfulness of the initial multisensory pairing, implying that multisensory exposures establish distinct object representations that are accessible during later unisensory processing. Multisensory contexts may be particularly effective in influencing auditory discrimination, given the purportedly inferior recognition memory in this sensory modality. The possibility of this generalization and the equivalence of effects when memory discrimination was being performed in the visual vs. auditory modality were at the focus of this study. First, we demonstrate that visual object discrimination is affected by the context of prior multisensory encounters, replicating and extending previous findings by controlling for the probability of multisensory contexts during initial as well as repeated object presentations. Second, we provide the first evidence that single-trial multisensory memories impact subsequent auditory object discrimination. Auditory object discrimination was enhanced when initial presentations entailed semantically congruent multisensory pairs and was impaired after semantically incongruent multisensory encounters, compared to sounds that had been encountered only in a unisensory manner. Third, the impact of single-trial multisensory memories upon unisensory object discrimination was greater when the task was performed in the auditory vs. visual modality. Fourth, there was no evidence for correlation between effects of past multisensory experiences on visual and auditory processing, suggestive of largely independent object processing mechanisms between modalities. We discuss these findings in terms of the conceptual short term memory (CSTM) model and predictive coding. Our results suggest differential recruitment and modulation of conceptual memory networks according to the sensory task at hand.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The large and growing number of digital images is making manual image search laborious. Only a fraction of the images contain metadata that can be used to search for a particular type of image. Thus, the main research question of this thesis is whether it is possible to learn visual object categories directly from images. Computers process images as long lists of pixels that do not have a clear connection to high-level semantics which could be used in the image search. There are various methods introduced in the literature to extract low-level image features and also approaches to connect these low-level features with high-level semantics. One of these approaches is called Bag-of-Features which is studied in the thesis. In the Bag-of-Features approach, the images are described using a visual codebook. The codebook is built from the descriptions of the image patches using clustering. The images are described by matching descriptions of image patches with the visual codebook and computing the number of matches for each code. In this thesis, unsupervised visual object categorisation using the Bag-of-Features approach is studied. The goal is to find groups of similar images, e.g., images that contain an object from the same category. The standard Bag-of-Features approach is improved by using spatial information and visual saliency. It was found that the performance of the visual object categorisation can be improved by using spatial information of local features to verify the matches. However, this process is computationally heavy, and thus, the number of images must be limited in the spatial matching, for example, by using the Bag-of-Features method as in this study. Different approaches for saliency detection are studied and a new method based on the Hessian-Affine local feature detector is proposed. The new method achieves comparable results with current state-of-the-art. The visual object categorisation performance was improved by using foreground segmentation based on saliency information, especially when the background could be considered as clutter.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This thesis addresses the problem of categorizing natural objects. To provide a criteria for categorization we propose that the purpose of a categorization is to support the inference of unobserved properties of objects from the observed properties. Because no such set of categories can be constructed in an arbitrary world, we present the Principle of Natural Modes as a claim about the structure of the world. We first define an evaluation function that measures how well a set of categories supports the inference goals of the observer. Entropy measures for property uncertainty and category uncertainty are combined through a free parameter that reflects the goals of the observer. Natural categorizations are shown to be those that are stable with respect to this free parameter. The evaluation function is tested in the domain of leaves and is found to be sensitive to the structure of the natural categories corresponding to the different species. We next develop a categorization paradigm that utilizes the categorization evaluation function in recovering natural categories. A statistical hypothesis generation algorithm is presented that is shown to be an effective categorization procedure. Examples drawn from several natural domains are presented, including data known to be a difficult test case for numerical categorization techniques. We next extend the categorization paradigm such that multiple levels of natural categories are recovered; by means of recursively invoking the categorization procedure both the genera and species are recovered in a population of anaerobic bacteria. Finally, a method is presented for evaluating the utility of features in recovering natural categories. This method also provides a mechanism for determining which features are constrained by the different processes present in a multiple modal world.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The report describes a recognition system called GROPER, which performs grouping by using distance and relative orientation constraints that estimate the likelihood of different edges in an image coming from the same object. The thesis presents both a theoretical analysis of the grouping problem and a practical implementation of a grouping system. GROPER also uses an indexing module to allow it to make use of knowledge of different objects, any of which might appear in an image. We test GROPER by comparing it to a similar recognition system that does not use grouping.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The HMAX model has recently been proposed by Riesenhuber & Poggio as a hierarchical model of position- and size-invariant object recognition in visual cortex. It has also turned out to model successfully a number of other properties of the ventral visual stream (the visual pathway thought to be crucial for object recognition in cortex), and particularly of (view-tuned) neurons in macaque inferotemporal cortex, the brain area at the top of the ventral stream. The original modeling study only used ``paperclip'' stimuli, as in the corresponding physiology experiment, and did not explore systematically how model units' invariance properties depended on model parameters. In this study, we aimed at a deeper understanding of the inner workings of HMAX and its performance for various parameter settings and ``natural'' stimulus classes. We examined HMAX responses for different stimulus sizes and positions systematically and found a dependence of model units' responses on stimulus position for which a quantitative description is offered. Interestingly, we find that scale invariance properties of hierarchical neural models are not independent of stimulus class, as opposed to translation invariance, even though both are affine transformations within the image plane.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Numerous psychophysical experiments have shown an important role for attentional modulations in vision. Behaviorally, allocation of attention can improve performance in object detection and recognition tasks. At the neural level, attention increases firing rates of neurons in visual cortex whose preferred stimulus is currently attended to. However, it is not yet known how these two phenomena are linked, i.e., how the visual system could be "tuned" in a task-dependent fashion to improve task performance. To answer this question, we performed simulations with the HMAX model of object recognition in cortex [45]. We modulated firing rates of model neurons in accordance with experimental results about effects of feature-based attention on single neurons and measured changes in the model's performance in a variety of object recognition tasks. It turned out that recognition performance could only be improved under very limited circumstances and that attentional influences on the process of object recognition per se tend to display a lack of specificity or raise false alarm rates. These observations lead us to postulate a new role for the observed attention-related neural response modulations.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This thesis presents there important results in visual object recognition based on shape. (1) A new algorithm (RAST; Recognition by Adaptive Sudivisions of Tranformation space) is presented that has lower average-case complexity than any known recognition algorithm. (2) It is shown, both theoretically and empirically, that representing 3D objects as collections of 2D views (the "View-Based Approximation") is feasible and affects the reliability of 3D recognition systems no more than other commonly made approximations. (3) The problem of recognition in cluttered scenes is considered from a Bayesian perspective; the commonly-used "bounded-error errorsmeasure" is demonstrated to correspond to an independence assumption. It is shown that by modeling the statistical properties of real-scenes better, objects can be recognized more reliably.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This workshop paper reports recent developments to a vision system for traffic interpretation which relies extensively on the use of geometrical and scene context. Firstly, a new approach to pose refinement is reported, based on forces derived from prominent image derivatives found close to an initial hypothesis. Secondly, a parameterised vehicle model is reported, able to represent different vehicle classes. This general vehicle model has been fitted to sample data, and subjected to a Principal Component Analysis to create a deformable model of common car types having 6 parameters. We show that the new pose recovery technique is also able to operate on the PCA model, to allow the structure of an initial vehicle hypothesis to be adapted to fit the prevailing context. We report initial experiments with the model, which demonstrate significant improvements to pose recovery.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Previous functional imaging studies have shown that facilitated processing of a visual object on repeated, relative to initial, presentation (i.e., repetition priming) is associated with reductions in neural activity in multiple regions, including fusiforin/lateral occipital cortex. Moreover, activity reductions have been found, at diminished levels, when a different exemplar of an object is presented on repetition. In one previous study, the magnitude of diminished priming across exemplars was greater in the right relative to the left fusiform, suggesting greater exemplar specificity in the right. Another previous study, however, observed fusiform lateralization modulated by object viewpoint, but not object exemplar. The present fMRI study sought to determine whether the result of differential fusiform responses for perceptually different exemplars could be replicated. Furthermore, the role of the left fusiform cortex in object recognition was investigated via the inclusion of a lexical/semantic manipulation. Right fusiform cortex showed a significantly greater effect of exemplar change than left fusiform, replicating the previous result of exemplar-specific fusiform lateralization. Right fusiform and lateral occipital cortex were not differentially engaged by the lexical/semantic manipulation, suggesting that their role in visual object recognition is predominantly in the. C visual discrimination of specific objects. Activation in left fusiform cortex, but not left lateral occipital cortex, was modulated by both exemplar change and lexical/semantic manipulation, with further analysis suggesting a posterior-to-anterior progression between regions involved in processing visuoperceptual and lexical/semantic information about objects. The results are consistent with the view that the right fusiform plays a greater role in processing specific visual form information about objects, whereas the left fusiform is also involved in lexical/semantic processing. (C) 2003 Elsevier Science (USA). All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Primate multisensory object perception involves distributed brain regions. To investigate the network character of these regions of the human brain, we applied data-driven group spatial independent component analysis (ICA) to a functional magnetic resonance imaging (fMRI) data set acquired during a passive audio-visual (AV) experiment with common object stimuli. We labeled three group-level independent component (IC) maps as auditory (A), visual (V), and AV, based on their spatial layouts and activation time courses. The overlap between these IC maps served as definition of a distributed network of multisensory candidate regions including superior temporal, ventral occipito-temporal, posterior parietal and prefrontal regions. During an independent second fMRI experiment, we explicitly tested their involvement in AV integration. Activations in nine out of these twelve regions met the max-criterion (A < AV > V) for multisensory integration. Comparison of this approach with a general linear model-based region-of-interest definition revealed its complementary value for multisensory neuroimaging. In conclusion, we estimated functional networks of uni- and multisensory functional connectivity from one dataset and validated their functional roles in an independent dataset. These findings demonstrate the particular value of ICA for multisensory neuroimaging research and using independent datasets to test hypotheses generated from a data-driven analysis.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A low complex but highly-efficient object counter algorithm is presented that can be embedded in hardware with a low computational power. This is achieved by a novel soft-data association strategy that can handle multimodal distributions.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

En el presente trabajo se aborda el problema del seguimiento de objetos, cuyo objetivo es encontrar la trayectoria de un objeto en una secuencia de video. Para ello, se ha desarrollado un método de seguimiento-por-detección que construye un modelo de apariencia en un dominio comprimido usando una nueva e innovadora técnica: “compressive sensing”. La única información necesaria es la situación del objeto a seguir en la primera imagen de la secuencia. El seguimiento de objetos es una aplicación típica del área de visión artificial con un desarrollo de bastantes años. Aun así, sigue siendo una tarea desafiante debido a varios factores: cambios de iluminación, oclusión parcial o total de los objetos y complejidad del fondo de la escena, los cuales deben ser considerados para conseguir un seguimiento robusto. Para lidiar lo más eficazmente posible con estos factores, hemos propuesto un algoritmo de tracking que entrena un clasificador Máquina Vector Soporte (“Support Vector Machine” o SVM en sus siglas en inglés) en modo online para separar los objetos del fondo de la escena. Con este fin, hemos generado nuestro modelo de apariencia por medio de un descriptor de características muy robusto que describe los objetos y el fondo devolviendo un vector de dimensiones muy altas. Por ello, se ha implementado seguidamente un paso para reducir la dimensionalidad de dichos vectores y así poder entrenar nuestro clasificador en un dominio mucho menor, al que denominamos domino comprimido. La reducción de la dimensionalidad de los vectores de características se basa en la teoría de “compressive sensing”, que dice que una señal con poca dispersión (pocos componentes distintos de cero) puede estar bien representada, e incluso puede ser reconstruida, a partir de un conjunto muy pequeño de muestras. La teoría de “compressive sensing” se ha aplicado satisfactoriamente en este trabajo y diferentes técnicas de medida y reconstrucción han sido probadas para evaluar nuestros vectores reducidos, de tal forma que se ha verificado que son capaces de preservar la información de los vectores originales. También incluimos una actualización del modelo de apariencia del objeto a seguir, mediante el reentrenamiento de nuestro clasificador en cada cuadro de la secuencia con muestras positivas y negativas, las cuales han sido obtenidas a partir de la posición predicha por el algoritmo de seguimiento en cada instante temporal. El algoritmo propuesto ha sido evaluado en distintas secuencias y comparado con otros algoritmos del estado del arte de seguimiento, para así demostrar el éxito de nuestro método.