975 resultados para Visual Object Identification
Resumo:
This report presents an algorithm for locating the cut points for and separatingvertically attached traffic signs in Sweden. This algorithm provides severaladvanced digital image processing features: binary image which representsvisual object and its complex rectangle background with number one and zerorespectively, improved cross correlation which shows the similarity of 2Dobjects and filters traffic sign candidates, simplified shape decompositionwhich smoothes contour of visual object iteratively in order to reduce whitenoises, flipping point detection which locates black noises candidates, chasmfilling algorithm which eliminates black noises, determines the final cut pointsand separates originally attached traffic signs into individual ones. At each step,the mediate results as well as the efficiency in practice would be presented toshow the advantages and disadvantages of the developed algorithm. Thisreport concentrates on contour-based recognition of Swedish traffic signs. Thegeneral shapes cover upward triangle, downward triangle, circle, rectangle andoctagon. At last, a demonstration program would be presented to show howthe algorithm works in real-time environment.
Resumo:
Tiere müssen Nahrung, Fortpflanzungspartner oder eine angenehme Umgebung finden und gleichzeitig eventuellen Gefahren aus dem Weg gehen. Eine effektive Orientierungsstrategie stellt für sie einen enormen Vorteil dar, vor allem wenn sie sich in einer komplexen Umwelt bewegen. Eine bisher unbekannte Art, die Orientierung zu optimieren, wird in dieser Arbeit vorgestellt. Sie analysiert, wie sich Taufliegen in einem Temperatur- Gradienten sowie in einer visuell geprägten Umwelt orientieren. Die dabei gefundene Orientierungsstrategie wird als „Memotaxis“ bezeichnet. Sie basiert auf der Integration von Informationen entlang der Wegstrecke, was dazu führt, dass die eingeschlagene Richtung proportional zum positiven Feedback immer stereotyper beibehalten wird. Obwohl die Memotaxis perfekt für die Orientierung in verrauschten Gradienten geeignet ist, wurde ihre Existenz in Situationen mit wenig Rauschen nachgewiesen. Die Strategie führt im Temperaturgradienten dazu, dass Fliegen umso weiter über ein Temperaturoptimum hinweg laufen, je weiter sie vorher darauf zuliefen. Beim Anlauf visueller Stimuli zeigen sie ein ähnliches Verhalten. Je weiter sie auf eine Landmarke zulaufen, desto länger dauert es, bis sie nach deren Verschwinden von dieser Richtung abweichen. Dies gilt auch dann, wenn man gleichzeitig mit dem Verschwinden der Landmarke der Fliege eine andere anbietet. Memotaxis sollte bei vielen Tieren eine gewichtige Rolle spielen, bei der Taufliege können durch die verfügbaren genetischen Methoden zusätzlich die dafür relevanten Gehirnzentren und die biochemischen Komponenten gefunden werden. Der Ellipsoidkörper des Zentralkomplexes ist für die Memotaxis in visuellen Umgebungen notwendig.rnDas Verhalten auf einem vertikalen Laufband wurde analysiert, vor allem im Hinblick auf die adaptive Termination dieses Verhaltens. Die Fliegen erkannten lange Zeit nicht, dass ihr Verhalten nicht zielführend ist und liefen stereotyp und ohne voranzukommen nach oben. Dieses Verhalten wird sogar noch verstärkt, wenn man das visuelle Feedback für die Bewertung ihres Verhaltens verstärkt. rn
Resumo:
The unsupervised categorization of sensory stimuli is typically attributed to feedforward processing in a hierarchy of cortical areas. This purely sensory-driven view of cortical processing, however, ignores any internal modulation, e.g., by top-down attentional signals or neuromodulator release. To isolate the role of internal signaling on category formation, we consider an unbroken continuum of stimuli without intrinsic category boundaries. We show that a competitive network, shaped by recurrent inhibition and endowed with Hebbian and homeostatic synaptic plasticity, can enforce stimulus categorization. The degree of competition is internally controlled by the neuronal gain and the strength of inhibition. Strong competition leads to the formation of many attracting network states, each being evoked by a distinct subset of stimuli and representing a category. Weak competition allows more neurons to be co-active, resulting in fewer but larger categories. We conclude that the granularity of cortical category formation, i.e., the number and size of emerging categories, is not simply determined by the richness of the stimulus environment, but rather by some global internal signal modulating the network dynamics. The model also explains the salient non-additivity of visual object representation observed in the monkey inferotemporal (IT) cortex. Furthermore, it offers an explanation of a previously observed, demand-dependent modulation of IT activity on a stimulus categorization task and of categorization-related cognitive deficits in schizophrenic patients.
Resumo:
Background: Statistical shape models are widely used in biomedical research. They are routinely implemented for automatic image segmentation or object identification in medical images. In these fields, however, the acquisition of the large training datasets, required to develop these models, is usually a time-consuming process. Even after this effort, the collections of datasets are often lost or mishandled resulting in replication of work. Objective: To solve these problems, the Virtual Skeleton Database (VSD) is proposed as a centralized storage system where the data necessary to build statistical shape models can be stored and shared. Methods: The VSD provides an online repository system tailored to the needs of the medical research community. The processing of the most common image file types, a statistical shape model framework, and an ontology-based search provide the generic tools to store, exchange, and retrieve digital medical datasets. The hosted data are accessible to the community, and collaborative research catalyzes their productivity. Results: To illustrate the need for an online repository for medical research, three exemplary projects of the VSD are presented: (1) an international collaboration to achieve improvement in cochlear surgery and implant optimization, (2) a population-based analysis of femoral fracture risk between genders, and (3) an online application developed for the evaluation and comparison of the segmentation of brain tumors. Conclusions: The VSD is a novel system for scientific collaboration for the medical image community with a data-centric concept and semantically driven search option for anatomical structures. The repository has been proven to be a useful tool for collaborative model building, as a resource for biomechanical population studies, or to enhance segmentation algorithms.
Resumo:
The introduction of open-plan offices in the 1960s with the intent of making the workplace more flexible, efficient, and team-oriented resulted in a higher noise floor level, which not only made concentrated work more difficult, but also caused physiological problems, such as increased stress, in addition to a loss of speech privacy. Irrelevant background human speech, in particular, has proven to be a major factor in disrupting concentration and lowering performance. Therefore, reducing the intelligibility of speech and has been a goal of increasing importance in recent years. One method employed to do so is the use of masking noises, which consists in emitting a continuous noise signal over a loudspeaker system that conceals the perturbing speech. Studies have shown that while effective, the maskers employed to date – normally filtered pink noise – are generally poorly accepted by users. The collaborative "Private Workspace" project, within the scope of which this thesis was carried out, attempts to develop a coupled, adaptive noise masking system along with a physical structure to be used for open-plan offices so as to combat these issues. There is evidence to suggest that nature sounds might be more accepted as masker, in part because they can have a visual object that acts as the source for the sound. Direct audio recordings are not recommended for various reasons, and thus the nature sounds must be synthesized. This work done consists of the synthesis of a sound texture to be used as a masker as well as its evaluation. The sound texture is composed of two parts: a wind-like noise synthesized with subtractive synthesis, and a leaf-like noise synthesized through granular synthesis. Different combinations of these two noises produced five variations of the masker, which were evaluated at different levels along with white noise and pink noise using a modified version of an Oldenburger Satztest to test for an affect on speech intelligibility and a questionnaire to asses its subjective acceptance. The goal was to find which of the synthesized noises works best as a speech masker. This thesis first uses a theoretical introduction to establish the basics of sound perception, psychoacoustic masking, and sound texture synthesis. The design of each of the noises, as well as their respective implementations in MATLAB, is explained, followed by the procedures used to evaluate the maskers. The results obtained in the evaluation are analyzed. Lastly, conclusions are drawn and future work is and modifications to the masker are proposed. RESUMEN. La introducción de las oficinas abiertas en los años 60 tenía como objeto flexibilizar el ambiente laboral, hacerlo más eficiente y que estuviera más orientado al trabajo en equipo. Como consecuencia, subió el nivel de ruido de fondo, que no sólo dificulta la concentración, sino que causa problemas fisiológicos, como el aumento del estrés, además de reducir la privacidad. Hay estudios que prueban que las conversaciones de fondo en particular tienen un efecto negativo en el nivel de concentración y disminuyen el rendimiento de los trabajadores. Por lo tanto, reducir la inteligibilidad del habla es uno de los principales objetivos en la actualidad. Un método empleado para hacerlo ha sido el uso de ruido enmascarante, que consiste en reproducir señales continuas de ruido a través de un sistema de altavoces que enmascare el habla. Aunque diversos estudios demuestran que es un método eficaz, los ruidos utilizados hasta la fecha (normalmente ruido rosa filtrado), no son muy bien aceptados por los usuarios. El proyecto colaborativo "Private Workspace", dentro del cual se engloba el trabajo realizado en este Proyecto Fin de Grado, tiene por objeto desarrollar un sistema de ruido enmascarador acoplado y adaptativo, además de una estructura física, para su uso en oficinas abiertas con el fin de combatir los problemas descritos anteriormente. Existen indicios de que los sonidos naturales son mejor aceptados, en parte porque pueden tener una estructura física que simule ser la fuente de los mismos. La utilización de grabaciones directas de estos sonidos no está recomendada por varios motivos, y por lo tanto los sonidos naturales deben ser sintetizados. El presente trabajo consiste en la síntesis de una textura de sonido (en inglés sound texture) para ser usada como ruido enmascarador, además de su evaluación. La textura está compuesta de dos partes: un sonido de viento sintetizado mediante síntesis sustractiva y un sonido de hojas sintetizado mediante síntesis granular. Diferentes combinaciones de estos dos sonidos producen cinco variaciones de ruido enmascarador. Estos cinco ruidos han sido evaluados a diferentes niveles, junto con ruido blanco y ruido rosa, mediante una versión modificada de un Oldenburger Satztest para comprobar cómo afectan a la inteligibilidad del habla, y mediante un cuestionario para una evaluación subjetiva de su aceptación. El objetivo era encontrar qué ruido de los que se han sintetizado funciona mejor como enmascarador del habla. El proyecto consiste en una introducción teórica que establece las bases de la percepción del sonido, el enmascaramiento psicoacústico, y la síntesis de texturas de sonido. Se explica a continuación el diseño de cada uno de los ruidos, así como su implementación en MATLAB. Posteriormente se detallan los procedimientos empleados para evaluarlos. Los resultados obtenidos se analizan y se extraen conclusiones. Por último, se propone un posible trabajo futuro y mejoras al ruido sintetizado.
Resumo:
In this paper, we consider the task of recognizing epigraphs in images such as photos taken using mobile devices. Given a set of 17,155 photos related to 14,560 epigraphs, we used a k-NearestNeighbor approach in order to perform the recognition. The contribution of this work is in evaluating state-of-the-art visual object recognition techniques in this specific context. The experimental results conducted show that Vector of Locally Aggregated Descriptors obtained aggregating SIFT descriptors is the best choice for this task.
Resumo:
Background: Light microscopic analysis of diatom frustules is widely used both in basic and applied research, notably taxonomy, morphometrics, water quality monitoring and paleo-environmental studies. In these applications, usually large numbers of frustules need to be identified and / or measured. Although there is a need for automation in these applications, and image processing and analysis methods supporting these tasks have previously been developed, they did not become widespread in diatom analysis. While methodological reports for a wide variety of methods for image segmentation, diatom identification and feature extraction are available, no single implementation combining a subset of these into a readily applicable workflow accessible to diatomists exists. Results: The newly developed tool SHERPA offers a versatile image processing workflow focused on the identification and measurement of object outlines, handling all steps from image segmentation over object identification to feature extraction, and providing interactive functions for reviewing and revising results. Special attention was given to ease of use, applicability to a broad range of data and problems, and supporting high throughput analyses with minimal manual intervention. Conclusions: Tested with several diatom datasets from different sources and of various compositions, SHERPA proved its ability to successfully analyze large amounts of diatom micrographs depicting a broad range of species. SHERPA is unique in combining the following features: application of multiple segmentation methods and selection of the one giving the best result for each individual object; identification of shapes of interest based on outline matching against a template library; quality scoring and ranking of resulting outlines supporting quick quality checking; extraction of a wide range of outline shape descriptors widely used in diatom studies and elsewhere; minimizing the need for, but enabling manual quality control and corrections. Although primarily developed for analyzing images of diatom valves originating from automated microscopy, SHERPA can also be useful for other object detection, segmentation and outline-based identification problems.
Resumo:
Current state of the art techniques for landmine detection in ground penetrating radar (GPR) utilize statistical methods to identify characteristics of a landmine response. This research makes use of 2-D slices of data in which subsurface landmine responses have hyperbolic shapes. Various methods from the field of visual image processing are adapted to the 2-D GPR data, producing superior landmine detection results. This research goes on to develop a physics-based GPR augmentation method motivated by current advances in visual object detection. This GPR specific augmentation is used to mitigate issues caused by insufficient training sets. This work shows that augmentation improves detection performance under training conditions that are normally very difficult. Finally, this work introduces the use of convolutional neural networks as a method to learn feature extraction parameters. These learned convolutional features outperform hand-designed features in GPR detection tasks. This work presents a number of methods, both borrowed from and motivated by the substantial work in visual image processing. The methods developed and presented in this work show an improvement in overall detection performance and introduce a method to improve the robustness of statistical classification.
Resumo:
A ilustração aplicada ao branding resulta de um modo reflexivo por parte do autor. Esse modo é, por si só, o papel do ilustrador como designer gráfico. A identidade de uma marca nasce da sua história, contexto e sensações, as quais o autor adquire e transmite, segundo as suas vivências, de modo a responder às necessidades das pessoas que o rodeiam. O desenvolvimento de uma marca é um longo processo de análise e reflexão, contínuo e exigente. Aplicando a ilustração a este meio, como objeto visual principal, a língua deixa de ser um entrave e a identidade passa a ser comunicada aos olhos e memória de qualquer um, de forma imediata e eficaz. Conceptualmente, a Tinta Barroca absorve estes princípios, transformando-se numa marca de eventos culturais, embora bastante focada em eventos que podem abranger jantares bem portugueses ou provas de vinho. O projeto foi desenvolvido à base do experimentalismo. Todas as ilustrações da marca foram, numa primeira fase, produzidas manualmente e posteriormente tratadas digitalmente, testando diferentes formas, texturas e materiais. A excessividade ilustrativa é o ponto de partida para comunicar as ideologias da Tinta Barroca, baseando-se no barroquismo, erotismo e nos prazeres da vida. A identidade gráfica da marca misturase com uma decoração já pré-definida: uma mesa bem preenchida e recheada de flores, frutos, vinho e comidas divinais, que se aproximam, pelo excesso, dos princípios do barroco.
Resumo:
Hand detection on images has important applications on person activities recognition. This thesis focuses on PASCAL Visual Object Classes (VOC) system for hand detection. VOC has become a popular system for object detection, based on twenty common objects, and has been released with a successful deformable parts model in VOC2007. A hand detection on an image is made when the system gets a bounding box which overlaps with at least 50% of any ground truth bounding box for a hand on the image. The initial average precision of this detector is around 0.215 compared with a state-of-art of 0.104; however, color and frequency features for detected bounding boxes contain important information for re-scoring, and the average precision can be improved to 0.218 with these features. Results show that these features help on getting higher precision for low recall, even though the average precision is similar.
Resumo:
The classical computer vision methods can only weakly emulate some of the multi-level parallelisms in signal processing and information sharing that takes place in different parts of the primates’ visual system thus enabling it to accomplish many diverse functions of visual perception. One of the main functions of the primates’ vision is to detect and recognise objects in natural scenes despite all the linear and non-linear variations of the objects and their environment. The superior performance of the primates’ visual system compared to what machine vision systems have been able to achieve to date, motivates scientists and researchers to further explore this area in pursuit of more efficient vision systems inspired by natural models. In this paper building blocks for a hierarchical efficient object recognition model are proposed. Incorporating the attention-based processing would lead to a system that will process the visual data in a non-linear way focusing only on the regions of interest and hence reducing the time to achieve real-time performance. Further, it is suggested to modify the visual cortex model for recognizing objects by adding non-linearities in the ventral path consistent with earlier discoveries as reported by researchers in the neuro-physiology of vision.
Resumo:
This paper presents a video surveillance framework that robustly and efficiently detects abandoned objects in surveillance scenes. The framework is based on a novel threat assessment algorithm which combines the concept of ownership with automatic understanding of social relations in order to infer abandonment of objects. Implementation is achieved through development of a logic-based inference engine based on Prolog. Threat detection performance is conducted by testing against a range of datasets describing realistic situations and demonstrates a reduction in the number of false alarms generated. The proposed system represents the approach employed in the EU SUBITO project (Surveillance of Unattended Baggage and the Identification and Tracking of the Owner).
Resumo:
This paper describes the participation of DAEDALUS at ImageCLEF 2011 Plant Identification task. The task is evaluated as a supervised classification problem over 71 tree species from the French Mediterranean area used as class labels, based on visual content from scan, scan-like and natural photo images. Our approach to this task is to build a classifier based on the detection of keypoints from the images extracted using Lowe’s Scale Invariant Feature Transform (SIFT) algorithm. Although our overall classification score is very low as compared to other participant groups, the main conclusion that can be drawn is that SIFT keypoints seem to work significantly better for photos than for the other image types, so our approach may be a feasible strategy for the classification of this kind of visual content.
Resumo:
Magdeburg, Univ., Fak. für Informatik, Diss., 2014
Resumo:
Kandidaatintyö tehtiin osana PulpVision-tutkimusprojektia, jonka tarkoituksena on kehittää kuvapohjaisia laskenta- ja luokittelumetodeja sellun laaduntarkkailuun paperin valmistuksessa. Tämän tutkimusprojektin osana on aiemmin kehitetty metodi, jolla etsittiin kaarevia rakenteita kuvista, ja tätä metodia hyödynnettiin kuitujen etsintään kuvista. Tätä metodia käytettiin lähtökohtana kandidaatintyölle. Työn tarkoituksena oli tutkia, voidaanko erilaisista kuitukuvista laskettujen piirteiden avulla tunnistaa kuvassa olevien kuitujen laji. Näissä kuitukuvissa oli kuituja neljästä eri puulajista ja yhdestä kasvista. Nämä lajit olivat akasia, koivu, mänty, eukalyptus ja vehnä. Jokaisesta lajista valittiin 100 kuitukuvaa ja nämä kuvat jaettiin kahteen ryhmään, joista ensimmäistä käytettiin opetusryhmänä ja toista testausryhmänä. Opetusryhmän avulla jokaiselle kuitulajille laskettiin näitä kuvaavia piirteitä, joiden avulla pyrittiin tunnistamaan testausryhmän kuvissa olevat kuitulajit. Nämä kuvat oli tuottanut CEMIS-Oulu (Center for Measurement and Information Systems), joka on mittaustekniikkaan keskittynyt yksikkö Oulun yliopistossa. Yksittäiselle opetusryhmän kuitukuvalle laskettiin keskiarvot ja keskihajonnat kolmesta eri piirteestä, jotka olivat pituus, leveys ja kaarevuus. Lisäksi laskettiin, kuinka monta kuitua kuvasta löydettiin. Näiden piirteiden eri yhdistelmien avulla testattiin tunnistamisen tarkkuutta käyttämällä k:n lähimmän naapurin menetelmää ja Naiivi Bayes -luokitinta testausryhmän kuville. Testeistä saatiin lupaavia tuloksia muun muassa pituuden ja leveyden keskiarvoja käytettäessä saavutettiin jopa noin 98 %:n tarkkuus molemmilla algoritmeilla. Tunnistuksessa kuitujen keskimäärinen pituus vaikutti olevan kuitukuvia parhaiten kuvaava piirre. Käytettyjen algoritmien välillä ei ollut suurta vaihtelua tarkkuudessa. Testeissä saatujen tulosten perusteella voidaan todeta, että kuitukuvien tunnistaminen on mahdollista. Testien perusteella kuitukuvista tarvitsee laskea vain kaksi piirrettä, joilla kuidut voidaan tunnistaa tarkasti. Käytetyt lajittelualgoritmit olivat hyvin yksinkertaisia, mutta ne toimivat testeissä hyvin.