982 resultados para visual object categorization


Relevância:

80.00% 80.00%

Publicador:

Resumo:

This thesis is about detection of local image features. The research topic belongs to the wider area of object detection, which is a machine vision and pattern recognition problem where an object must be detected (located) in an image. State-of-the-art object detection methods often divide the problem into separate interest point detection and local image description steps, but in this thesis a different technique is used, leading to higher quality image features which enable more precise localization. Instead of using interest point detection the landmark positions are marked manually. Therefore, the quality of the image features is not limited by the interest point detection phase and the learning of image features is simplified. The approach combines both interest point detection and local description into one phase for detection. Computational efficiency of the descriptor is therefore important, leaving out many of the commonly used descriptors as unsuitably heavy. Multiresolution Gabor features has been the main descriptor in this thesis and improving their efficiency is a significant part. Actual image features are formed from descriptors by using a classifierwhich can then recognize similar looking patches in new images. The main classifier is based on Gaussian mixture models. Classifiers are used in one-class classifier configuration where there are only positive training samples without explicit background class. The local image feature detection method has been tested with two freely available face detection databases and a proprietary license plate database. The localization performance was very good in these experiments. Other applications applying the same under-lying techniques are also presented, including object categorization and fault detection.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Visual object tracking has been one of the most popular research topics in the field of computer vision recently. Specifically, hand tracking has attracted significant attention since it would enable many useful practical applications. However, hand tracking is still a very challenging problem which cannot be considered solved. The fact that almost every aspect of hand appearance can change is the fundamental reason for this difficulty. This thesis focused on 2D-based hand tracking in high-speed camera videos. During the project, a toolbox for this purpose was collected which contains nine different tracking methods. In the experiments, these methods were tested and compared against each other with both high-speed videos recorded during the project and publicly available normal speed videos. The results revealed that tracking accuracies varied considerably depending on the video and the method. Therefore, no single method was clearly the best in all videos, but three methods, CT, HT, and TLD, performed better than the others overall. Moreover, the results provide insights about the suitability of each method to different types and situations of hand tracking.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Identification of low-dimensional structures and main sources of variation from multivariate data are fundamental tasks in data analysis. Many methods aimed at these tasks involve solution of an optimization problem. Thus, the objective of this thesis is to develop computationally efficient and theoretically justified methods for solving such problems. Most of the thesis is based on a statistical model, where ridges of the density estimated from the data are considered as relevant features. Finding ridges, that are generalized maxima, necessitates development of advanced optimization methods. An efficient and convergent trust region Newton method for projecting a point onto a ridge of the underlying density is developed for this purpose. The method is utilized in a differential equation-based approach for tracing ridges and computing projection coordinates along them. The density estimation is done nonparametrically by using Gaussian kernels. This allows application of ridge-based methods with only mild assumptions on the underlying structure of the data. The statistical model and the ridge finding methods are adapted to two different applications. The first one is extraction of curvilinear structures from noisy data mixed with background clutter. The second one is a novel nonlinear generalization of principal component analysis (PCA) and its extension to time series data. The methods have a wide range of potential applications, where most of the earlier approaches are inadequate. Examples include identification of faults from seismic data and identification of filaments from cosmological data. Applicability of the nonlinear PCA to climate analysis and reconstruction of periodic patterns from noisy time series data are also demonstrated. Other contributions of the thesis include development of an efficient semidefinite optimization method for embedding graphs into the Euclidean space. The method produces structure-preserving embeddings that maximize interpoint distances. It is primarily developed for dimensionality reduction, but has also potential applications in graph theory and various areas of physics, chemistry and engineering. Asymptotic behaviour of ridges and maxima of Gaussian kernel densities is also investigated when the kernel bandwidth approaches infinity. The results are applied to the nonlinear PCA and to finding significant maxima of such densities, which is a typical problem in visual object tracking.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Les temps de réponse dans une tache de reconnaissance d’objets visuels diminuent de façon significative lorsque les cibles peuvent être distinguées à partir de deux attributs redondants. Le gain de redondance pour deux attributs est un résultat commun dans la littérature, mais un gain causé par trois attributs redondants n’a été observé que lorsque ces trois attributs venaient de trois modalités différentes (tactile, auditive et visuelle). La présente étude démontre que le gain de redondance pour trois attributs de la même modalité est effectivement possible. Elle inclut aussi une investigation plus détaillée des caractéristiques du gain de redondance. Celles-ci incluent, outre la diminution des temps de réponse, une diminution des temps de réponses minimaux particulièrement et une augmentation de la symétrie de la distribution des temps de réponse. Cette étude présente des indices que ni les modèles de course, ni les modèles de coactivation ne sont en mesure d’expliquer l’ensemble des caractéristiques du gain de redondance. Dans ce contexte, nous introduisons une nouvelle méthode pour évaluer le triple gain de redondance basée sur la performance des cibles doublement redondantes. Le modèle de cascade est présenté afin d’expliquer les résultats de cette étude. Ce modèle comporte plusieurs voies de traitement qui sont déclenchées par une cascade d’activations avant de satisfaire un seul critère de décision. Il offre une approche homogène aux recherches antérieures sur le gain de redondance. L’analyse des caractéristiques des distributions de temps de réponse, soit leur moyenne, leur symétrie, leur décalage ou leur étendue, est un outil essentiel pour cette étude. Il était important de trouver un test statistique capable de refléter les différences au niveau de toutes ces caractéristiques. Nous abordons la problématique d’analyser les temps de réponse sans perte d’information, ainsi que l’insuffisance des méthodes d’analyse communes dans ce contexte, comme grouper les temps de réponses de plusieurs participants (e. g. Vincentizing). Les tests de distributions, le plus connu étant le test de Kolmogorov- Smirnoff, constituent une meilleure alternative pour comparer des distributions, celles des temps de réponse en particulier. Un test encore inconnu en psychologie est introduit : le test d’Anderson-Darling à deux échantillons. Les deux tests sont comparés, et puis nous présentons des indices concluants démontrant la puissance du test d’Anderson-Darling : en comparant des distributions qui varient seulement au niveau de (1) leur décalage, (2) leur étendue, (3) leur symétrie, ou (4) leurs extrémités, nous pouvons affirmer que le test d’Anderson-Darling reconnait mieux les différences. De plus, le test d’Anderson-Darling a un taux d’erreur de type I qui correspond exactement à l’alpha tandis que le test de Kolmogorov-Smirnoff est trop conservateur. En conséquence, le test d’Anderson-Darling nécessite moins de données pour atteindre une puissance statistique suffisante.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Les études sont mitigées sur les séquelles cognitives des commotions cérébrales, certaines suggèrent qu’elles se résorbent rapidement tandis que d’autres indiquent qu’elles persistent dans le temps. Par contre, aucunes données n’existent pour indiquer si une tâche cognitive comme l’imagerie mentale visuelle fait ressortir des séquelles à la suite d’une commotion cérébrale. Ainsi, la présente étude a pour objet d’évaluer l’effet des commotions cérébrales d’origine sportive sur la capacité d’imagerie mentale visuelle d’objets et d’imagerie spatiale des athlètes. Afin de répondre à cet objectif, nous comparons les capacités d’imagerie mentale chez des joueurs de football masculins de calibre universitaire sans historique répertorié de commotions cérébrales (n=15) et chez un second groupe d’athlète ayant été victime d’au moins une commotion cérébrale (n=15). Notre hypothèse est que les athlètes non-commotionnés ont une meilleure imagerie mentale que les athlètes commotionnés. Les résultats infirment notre hypothèse. Les athlètes commotionnés performent aussi bien que les athlètes non-commotionnés aux trois tests suivants : Paper Folding Test (PFT), Visual Object Identification Task (VOIT) et Vividness of Visual Imagery Questionnaire (VVIQ). De plus, ni le nombre de commotions cérébrales ni le temps écoulé depuis la dernière commotion cérébrale n’influent sur la performance des athlètes commotionnés.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This report presents an algorithm for locating the cut points for and separatingvertically attached traffic signs in Sweden. This algorithm provides severaladvanced digital image processing features: binary image which representsvisual object and its complex rectangle background with number one and zerorespectively, improved cross correlation which shows the similarity of 2Dobjects and filters traffic sign candidates, simplified shape decompositionwhich smoothes contour of visual object iteratively in order to reduce whitenoises, flipping point detection which locates black noises candidates, chasmfilling algorithm which eliminates black noises, determines the final cut pointsand separates originally attached traffic signs into individual ones. At each step,the mediate results as well as the efficiency in practice would be presented toshow the advantages and disadvantages of the developed algorithm. Thisreport concentrates on contour-based recognition of Swedish traffic signs. Thegeneral shapes cover upward triangle, downward triangle, circle, rectangle andoctagon. At last, a demonstration program would be presented to show howthe algorithm works in real-time environment.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Tiere müssen Nahrung, Fortpflanzungspartner oder eine angenehme Umgebung finden und gleichzeitig eventuellen Gefahren aus dem Weg gehen. Eine effektive Orientierungsstrategie stellt für sie einen enormen Vorteil dar, vor allem wenn sie sich in einer komplexen Umwelt bewegen. Eine bisher unbekannte Art, die Orientierung zu optimieren, wird in dieser Arbeit vorgestellt. Sie analysiert, wie sich Taufliegen in einem Temperatur- Gradienten sowie in einer visuell geprägten Umwelt orientieren. Die dabei gefundene Orientierungsstrategie wird als „Memotaxis“ bezeichnet. Sie basiert auf der Integration von Informationen entlang der Wegstrecke, was dazu führt, dass die eingeschlagene Richtung proportional zum positiven Feedback immer stereotyper beibehalten wird. Obwohl die Memotaxis perfekt für die Orientierung in verrauschten Gradienten geeignet ist, wurde ihre Existenz in Situationen mit wenig Rauschen nachgewiesen. Die Strategie führt im Temperaturgradienten dazu, dass Fliegen umso weiter über ein Temperaturoptimum hinweg laufen, je weiter sie vorher darauf zuliefen. Beim Anlauf visueller Stimuli zeigen sie ein ähnliches Verhalten. Je weiter sie auf eine Landmarke zulaufen, desto länger dauert es, bis sie nach deren Verschwinden von dieser Richtung abweichen. Dies gilt auch dann, wenn man gleichzeitig mit dem Verschwinden der Landmarke der Fliege eine andere anbietet. Memotaxis sollte bei vielen Tieren eine gewichtige Rolle spielen, bei der Taufliege können durch die verfügbaren genetischen Methoden zusätzlich die dafür relevanten Gehirnzentren und die biochemischen Komponenten gefunden werden. Der Ellipsoidkörper des Zentralkomplexes ist für die Memotaxis in visuellen Umgebungen notwendig.rnDas Verhalten auf einem vertikalen Laufband wurde analysiert, vor allem im Hinblick auf die adaptive Termination dieses Verhaltens. Die Fliegen erkannten lange Zeit nicht, dass ihr Verhalten nicht zielführend ist und liefen stereotyp und ohne voranzukommen nach oben. Dieses Verhalten wird sogar noch verstärkt, wenn man das visuelle Feedback für die Bewertung ihres Verhaltens verstärkt. rn

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The introduction of open-plan offices in the 1960s with the intent of making the workplace more flexible, efficient, and team-oriented resulted in a higher noise floor level, which not only made concentrated work more difficult, but also caused physiological problems, such as increased stress, in addition to a loss of speech privacy. Irrelevant background human speech, in particular, has proven to be a major factor in disrupting concentration and lowering performance. Therefore, reducing the intelligibility of speech and has been a goal of increasing importance in recent years. One method employed to do so is the use of masking noises, which consists in emitting a continuous noise signal over a loudspeaker system that conceals the perturbing speech. Studies have shown that while effective, the maskers employed to date – normally filtered pink noise – are generally poorly accepted by users. The collaborative "Private Workspace" project, within the scope of which this thesis was carried out, attempts to develop a coupled, adaptive noise masking system along with a physical structure to be used for open-plan offices so as to combat these issues. There is evidence to suggest that nature sounds might be more accepted as masker, in part because they can have a visual object that acts as the source for the sound. Direct audio recordings are not recommended for various reasons, and thus the nature sounds must be synthesized. This work done consists of the synthesis of a sound texture to be used as a masker as well as its evaluation. The sound texture is composed of two parts: a wind-like noise synthesized with subtractive synthesis, and a leaf-like noise synthesized through granular synthesis. Different combinations of these two noises produced five variations of the masker, which were evaluated at different levels along with white noise and pink noise using a modified version of an Oldenburger Satztest to test for an affect on speech intelligibility and a questionnaire to asses its subjective acceptance. The goal was to find which of the synthesized noises works best as a speech masker. This thesis first uses a theoretical introduction to establish the basics of sound perception, psychoacoustic masking, and sound texture synthesis. The design of each of the noises, as well as their respective implementations in MATLAB, is explained, followed by the procedures used to evaluate the maskers. The results obtained in the evaluation are analyzed. Lastly, conclusions are drawn and future work is and modifications to the masker are proposed. RESUMEN. La introducción de las oficinas abiertas en los años 60 tenía como objeto flexibilizar el ambiente laboral, hacerlo más eficiente y que estuviera más orientado al trabajo en equipo. Como consecuencia, subió el nivel de ruido de fondo, que no sólo dificulta la concentración, sino que causa problemas fisiológicos, como el aumento del estrés, además de reducir la privacidad. Hay estudios que prueban que las conversaciones de fondo en particular tienen un efecto negativo en el nivel de concentración y disminuyen el rendimiento de los trabajadores. Por lo tanto, reducir la inteligibilidad del habla es uno de los principales objetivos en la actualidad. Un método empleado para hacerlo ha sido el uso de ruido enmascarante, que consiste en reproducir señales continuas de ruido a través de un sistema de altavoces que enmascare el habla. Aunque diversos estudios demuestran que es un método eficaz, los ruidos utilizados hasta la fecha (normalmente ruido rosa filtrado), no son muy bien aceptados por los usuarios. El proyecto colaborativo "Private Workspace", dentro del cual se engloba el trabajo realizado en este Proyecto Fin de Grado, tiene por objeto desarrollar un sistema de ruido enmascarador acoplado y adaptativo, además de una estructura física, para su uso en oficinas abiertas con el fin de combatir los problemas descritos anteriormente. Existen indicios de que los sonidos naturales son mejor aceptados, en parte porque pueden tener una estructura física que simule ser la fuente de los mismos. La utilización de grabaciones directas de estos sonidos no está recomendada por varios motivos, y por lo tanto los sonidos naturales deben ser sintetizados. El presente trabajo consiste en la síntesis de una textura de sonido (en inglés sound texture) para ser usada como ruido enmascarador, además de su evaluación. La textura está compuesta de dos partes: un sonido de viento sintetizado mediante síntesis sustractiva y un sonido de hojas sintetizado mediante síntesis granular. Diferentes combinaciones de estos dos sonidos producen cinco variaciones de ruido enmascarador. Estos cinco ruidos han sido evaluados a diferentes niveles, junto con ruido blanco y ruido rosa, mediante una versión modificada de un Oldenburger Satztest para comprobar cómo afectan a la inteligibilidad del habla, y mediante un cuestionario para una evaluación subjetiva de su aceptación. El objetivo era encontrar qué ruido de los que se han sintetizado funciona mejor como enmascarador del habla. El proyecto consiste en una introducción teórica que establece las bases de la percepción del sonido, el enmascaramiento psicoacústico, y la síntesis de texturas de sonido. Se explica a continuación el diseño de cada uno de los ruidos, así como su implementación en MATLAB. Posteriormente se detallan los procedimientos empleados para evaluarlos. Los resultados obtenidos se analizan y se extraen conclusiones. Por último, se propone un posible trabajo futuro y mejoras al ruido sintetizado.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Classic identity negative priming (NP) refers to the finding that when an object is ignored, subsequent naming responses to it are slower than when it has not been previously ignored (Tipper, S.P., 1985. The negative priming effect: inhibitory priming by ignored objects. Q. J. Exp. Psychol. 37A, 571-590). It is unclear whether this phenomenon arises due to the involvement of abstract semantic representations that the ignored object accesses automatically. Contemporary connectionist models propose a key role for the anterior temporal cortex in the representation of abstract semantic knowledge (e.g., McClelland, J.L., Rogers, T.T., 2003. The parallel distributed processing approach to semantic cognition. Nat. Rev. Neurosci. 4, 310-322), suggesting that this region should be involved during performance of the classic identity NP task if it involves semantic access. Using high-field (4 T) event-related functional magnetic resonance imaging, we observed increased BOLD responses in the left anterolateral temporal cortex including the temporal pole that was directly related to the magnitude of each individual's NP effect, supporting a semantic locus. Additional signal increases were observed in the supplementary eye fields (SEF) and left inferior parietal lobule (IPL). (c) 2006 Elsevier Inc. All rights reserved.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In this paper, we consider the task of recognizing epigraphs in images such as photos taken using mobile devices. Given a set of 17,155 photos related to 14,560 epigraphs, we used a k-NearestNeighbor approach in order to perform the recognition. The contribution of this work is in evaluating state-of-the-art visual object recognition techniques in this specific context. The experimental results conducted show that Vector of Locally Aggregated Descriptors obtained aggregating SIFT descriptors is the best choice for this task.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Today, most conventional surveillance networks are based on analog system, which has a lot of constraints like manpower and high-bandwidth requirements. It becomes the barrier for today's surveillance network development. This dissertation describes a digital surveillance network architecture based on the H.264 coding/decoding (CODEC) System-on-a-Chip (SoC) platform. The proposed digital surveillance network architecture includes three major layers: software layer, hardware layer, and the network layer. The following outlines the contributions to the proposed digital surveillance network architecture. (1) We implement an object recognition system and an object categorization system on the software layer by applying several Digital Image Processing (DIP) algorithms. (2) For better compression ratio and higher video quality transfer, we implement two new modules on the hardware layer of the H.264 CODEC core, i.e., the background elimination module and the Directional Discrete Cosine Transform (DDCT) module. (3) Furthermore, we introduce a Digital Signal Processor (DSP) sub-system on the main bus of H.264 SoC platforms as the major hardware support system for our software architecture. Thus we combine the software and hardware platforms to be an intelligent surveillance node. Lab results show that the proposed surveillance node can dramatically save the network resources like bandwidth and storage capacity.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Current state of the art techniques for landmine detection in ground penetrating radar (GPR) utilize statistical methods to identify characteristics of a landmine response. This research makes use of 2-D slices of data in which subsurface landmine responses have hyperbolic shapes. Various methods from the field of visual image processing are adapted to the 2-D GPR data, producing superior landmine detection results. This research goes on to develop a physics-based GPR augmentation method motivated by current advances in visual object detection. This GPR specific augmentation is used to mitigate issues caused by insufficient training sets. This work shows that augmentation improves detection performance under training conditions that are normally very difficult. Finally, this work introduces the use of convolutional neural networks as a method to learn feature extraction parameters. These learned convolutional features outperform hand-designed features in GPR detection tasks. This work presents a number of methods, both borrowed from and motivated by the substantial work in visual image processing. The methods developed and presented in this work show an improvement in overall detection performance and introduce a method to improve the robustness of statistical classification.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

A ilustração aplicada ao branding resulta de um modo reflexivo por parte do autor. Esse modo é, por si só, o papel do ilustrador como designer gráfico. A identidade de uma marca nasce da sua história, contexto e sensações, as quais o autor adquire e transmite, segundo as suas vivências, de modo a responder às necessidades das pessoas que o rodeiam. O desenvolvimento de uma marca é um longo processo de análise e reflexão, contínuo e exigente. Aplicando a ilustração a este meio, como objeto visual principal, a língua deixa de ser um entrave e a identidade passa a ser comunicada aos olhos e memória de qualquer um, de forma imediata e eficaz. Conceptualmente, a Tinta Barroca absorve estes princípios, transformando-se numa marca de eventos culturais, embora bastante focada em eventos que podem abranger jantares bem portugueses ou provas de vinho. O projeto foi desenvolvido à base do experimentalismo. Todas as ilustrações da marca foram, numa primeira fase, produzidas manualmente e posteriormente tratadas digitalmente, testando diferentes formas, texturas e materiais. A excessividade ilustrativa é o ponto de partida para comunicar as ideologias da Tinta Barroca, baseando-se no barroquismo, erotismo e nos prazeres da vida. A identidade gráfica da marca misturase com uma decoração já pré-definida: uma mesa bem preenchida e recheada de flores, frutos, vinho e comidas divinais, que se aproximam, pelo excesso, dos princípios do barroco.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Hand detection on images has important applications on person activities recognition. This thesis focuses on PASCAL Visual Object Classes (VOC) system for hand detection. VOC has become a popular system for object detection, based on twenty common objects, and has been released with a successful deformable parts model in VOC2007. A hand detection on an image is made when the system gets a bounding box which overlaps with at least 50% of any ground truth bounding box for a hand on the image. The initial average precision of this detector is around 0.215 compared with a state-of-art of 0.104; however, color and frequency features for detected bounding boxes contain important information for re-scoring, and the average precision can be improved to 0.218 with these features. Results show that these features help on getting higher precision for low recall, even though the average precision is similar.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

To recognize a previously seen object, the visual system must overcome the variability in the object's appearance caused by factors such as illumination and pose. Developments in computer vision suggest that it may be possible to counter the influence of these factors, by learning to interpolate between stored views of the target object, taken under representative combinations of viewing conditions. Daily life situations, however, typically require categorization, rather than recognition, of objects. Due to the open-ended character both of natural kinds and of artificial categories, categorization cannot rely on interpolation between stored examples. Nonetheless, knowledge of several representative members, or prototypes, of each of the categories of interest can still provide the necessary computational substrate for the categorization of new instances. The resulting representational scheme based on similarities to prototypes appears to be computationally viable, and is readily mapped onto the mechanisms of biological vision revealed by recent psychophysical and physiological studies.