975 resultados para Visual Object Identification
Resumo:
ENVISAT ASAR WSM images with pixel size 150 × 150 m, acquired in different meteorological, oceanographic and sea ice conditions were used to determined icebergs in the Amundsen Sea (Antarctica). An object-based method for automatic iceberg detection from SAR data has been developed and applied. The object identification is based on spectral and spatial parameters on 5 scale levels, and was verified with manual classification in four polygon areas, chosen to represent varying environmental conditions. The algorithm works comparatively well in freezing temperatures and strong wind conditions, prevailing in the Amundsen Sea during the year. The detection rate was 96% which corresponds to 94% of the area (counting icebergs larger than 0.03 km**2), for all seasons. The presented algorithm tends to generate errors in the form of false alarms, mainly caused by the presence of ice floes, rather than misses. This affects the reliability since false alarms were manually corrected post analysis.
Resumo:
Automatic visual object counting and video surveillance have important applications for home and business environments, such as security and management of access points. However, in order to obtain a satisfactory performance these technologies need professional and expensive hardware, complex installations and setups, and the supervision of qualified workers. In this paper, an efficient visual detection and tracking framework is proposed for the tasks of object counting and surveillance, which meets the requirements of the consumer electronics: off-the-shelf equipment, easy installation and configuration, and unsupervised working conditions. This is accomplished by a novel Bayesian tracking model that can manage multimodal distributions without explicitly computing the association between tracked objects and detections. In addition, it is robust to erroneous, distorted and missing detections. The proposed algorithm is compared with a recent work, also focused on consumer electronics, proving its superior performance.
Resumo:
The perception of an object as a single entity within a visual scene requires that its features are bound together and segregated from the background and/or other objects. Here, we used magnetoencephalography (MEG) to assess the hypothesis that coherent percepts may arise from the synchronized high frequency (gamma) activity between neurons that code features of the same object. We also assessed the role of low frequency (alpha, beta) activity in object processing. The target stimulus (i.e. object) was a small patch of a concentric grating of 3c/°, viewed eccentrically. The background stimulus was either a blank field or a concentric grating of 3c/° periodicity, viewed centrally. With patterned backgrounds, the target stimulus emerged--through rotation about its own centre--as a circular subsection of the background. Data were acquired using a 275-channel whole-head MEG system and analyzed using Synthetic Aperture Magnetometry (SAM), which allows one to generate images of task-related cortical oscillatory power changes within specific frequency bands. Significant oscillatory activity across a broad range of frequencies was evident at the V1/V2 border, and subsequent analyses were based on a virtual electrode at this location. When the target was presented in isolation, we observed that: (i) contralateral stimulation yielded a sustained power increase in gamma activity; and (ii) both contra- and ipsilateral stimulation yielded near identical transient power changes in alpha (and beta) activity. When the target was presented against a patterned background, we observed that: (i) contralateral stimulation yielded an increase in high-gamma (>55 Hz) power together with a decrease in low-gamma (40-55 Hz) power; and (ii) both contra- and ipsilateral stimulation yielded a transient decrease in alpha (and beta) activity, though the reduction tended to be greatest for contralateral stimulation. The opposing power changes across different regions of the gamma spectrum with 'figure/ground' stimulation suggest a possible dual role for gamma rhythms in visual object coding, and provide general support of the binding-by-synchronization hypothesis. As the power changes in alpha and beta activity were largely independent of the spatial location of the target, however, we conclude that their role in object processing may relate principally to changes in visual attention.
Resumo:
As we look around a scene, we perceive it as continuous and stable even though each saccadic eye movement changes the visual input to the retinas. How the brain achieves this perceptual stabilization is unknown, but a major hypothesis is that it relies on presaccadic remapping, a process in which neurons shift their visual sensitivity to a new location in the scene just before each saccade. This hypothesis is difficult to test in vivo because complete, selective inactivation of remapping is currently intractable. We tested it in silico with a hierarchical, sheet-based neural network model of the visual and oculomotor system. The model generated saccadic commands to move a video camera abruptly. Visual input from the camera and internal copies of the saccadic movement commands, or corollary discharge, converged at a map-level simulation of the frontal eye field (FEF), a primate brain area known to receive such inputs. FEF output was combined with eye position signals to yield a suitable coordinate frame for guiding arm movements of a robot. Our operational definition of perceptual stability was "useful stability," quantified as continuously accurate pointing to a visual object despite camera saccades. During training, the emergence of useful stability was correlated tightly with the emergence of presaccadic remapping in the FEF. Remapping depended on corollary discharge but its timing was synchronized to the updating of eye position. When coupled to predictive eye position signals, remapping served to stabilize the target representation for continuously accurate pointing. Graded inactivations of pathways in the model replicated, and helped to interpret, previous in vivo experiments. The results support the hypothesis that visual stability requires presaccadic remapping, provide explanations for the function and timing of remapping, and offer testable hypotheses for in vivo studies. We conclude that remapping allows for seamless coordinate frame transformations and quick actions despite visual afferent lags. With visual remapping in place for behavior, it may be exploited for perceptual continuity.
Resumo:
This paper presents a solution to part of the problem of making robotic or semi-robotic digging equipment less dependant on human supervision. A method is described for identifying rocks of a certain size that may affect digging efficiency or require special handling. The process involves three main steps. First, by using range and intensity data from a time-of-flight (TOF) camera, a feature descriptor is used to rank points and separate regions surrounding high scoring points. This allows a wide range of rocks to be recognized because features can represent a whole or just part of a rock. Second, these points are filtered to extract only points thought to belong to the large object. Finally, a check is carried out to verify that the resultant point cloud actually represents a rock. Results are presented from field testing on piles of fragmented rock. Note to Practitioners—This paper presents an algorithm to identify large boulders in a pile of broken rock as a step towards an autonomous mining dig planner. In mining, piles of broken rock can contain large fragments that may need to be specially handled. To assess rock piles for excavation, we make use of a TOF camera that does not rely on external lighting to generate a point cloud of the rock pile. We then segment large boulders from its surface by using a novel feature descriptor and distinguish between real and false boulder candidates. Preliminary field experiments show promising results with the algorithm performing nearly as well as human test subjects.
Resumo:
Visual recognition is a fundamental research topic in computer vision. This dissertation explores datasets, features, learning, and models used for visual recognition. In order to train visual models and evaluate different recognition algorithms, this dissertation develops an approach to collect object image datasets on web pages using an analysis of text around the image and of image appearance. This method exploits established online knowledge resources (Wikipedia pages for text; Flickr and Caltech data sets for images). The resources provide rich text and object appearance information. This dissertation describes results on two datasets. The first is Berg’s collection of 10 animal categories; on this dataset, we significantly outperform previous approaches. On an additional set of 5 categories, experimental results show the effectiveness of the method. Images are represented as features for visual recognition. This dissertation introduces a text-based image feature and demonstrates that it consistently improves performance on hard object classification problems. The feature is built using an auxiliary dataset of images annotated with tags, downloaded from the Internet. Image tags are noisy. The method obtains the text features of an unannotated image from the tags of its k-nearest neighbors in this auxiliary collection. A visual classifier presented with an object viewed under novel circumstances (say, a new viewing direction) must rely on its visual examples. This text feature may not change, because the auxiliary dataset likely contains a similar picture. While the tags associated with images are noisy, they are more stable when appearance changes. The performance of this feature is tested using PASCAL VOC 2006 and 2007 datasets. This feature performs well; it consistently improves the performance of visual object classifiers, and is particularly effective when the training dataset is small. With more and more collected training data, computational cost becomes a bottleneck, especially when training sophisticated classifiers such as kernelized SVM. This dissertation proposes a fast training algorithm called Stochastic Intersection Kernel Machine (SIKMA). This proposed training method will be useful for many vision problems, as it can produce a kernel classifier that is more accurate than a linear classifier, and can be trained on tens of thousands of examples in two minutes. It processes training examples one by one in a sequence, so memory cost is no longer the bottleneck to process large scale datasets. This dissertation applies this approach to train classifiers of Flickr groups with many group training examples. The resulting Flickr group prediction scores can be used to measure image similarity between two images. Experimental results on the Corel dataset and a PASCAL VOC dataset show the learned Flickr features perform better on image matching, retrieval, and classification than conventional visual features. Visual models are usually trained to best separate positive and negative training examples. However, when recognizing a large number of object categories, there may not be enough training examples for most objects, due to the intrinsic long-tailed distribution of objects in the real world. This dissertation proposes an approach to use comparative object similarity. The key insight is that, given a set of object categories which are similar and a set of categories which are dissimilar, a good object model should respond more strongly to examples from similar categories than to examples from dissimilar categories. This dissertation develops a regularized kernel machine algorithm to use this category dependent similarity regularization. Experiments on hundreds of categories show that our method can make significant improvement for categories with few or even no positive examples.
Resumo:
The Fornax Spectroscopic Survey will use the Two degree Field spectrograph (2dF) of the Angle-Australian Telescope to obtain spectra for a complete sample of all 14000 objects with 16.5 less than or equal to b(j) less than or equal to 19.7 in a 12 square degree area centred on the Fornax Cluster. The aims of this project include the study of dwarf galaxies in the cluster (both known low surface brightness objects and putative normal surface brightness dwarfs) and a comparison sample of background field galaxies. We will also measure quasars and other active galaxies, any previously unrecognised compact galaxies and a large sample of Galactic stars. By selecting all objects-both stars and galaxies-independent of morphology, we cover a much larger range of surface brightness and scale size than previous surveys. In this paper we first describe the design of the survey. Our targets are selected from UK Schmidt Telescope sky survey plates digitised by the Automated Plate Measuring (APM) facility. We then describe the photometric and astrometric calibration of these data and show that the APM astrometry is accurate enough for use with the 2dF. We also describe a general approach to object identification using cross-correlations which allows us to identify and classify both stellar and galaxy spectra. We present results from the first 2dF field. Redshift distributions and velocity structures are shown for all observed objects in the direction of Fornax, including Galactic stars? galaxies in and around the Fornax Cluster, and for the background galaxy population. The velocity data for the stars show the contributions from the different Galactic components, plus a small tail to high velocities. We find no galaxies in the foreground to the cluster in our 2dF field. The Fornax Cluster is clearly defined kinematically. The mean velocity from the 26 cluster members having reliable redshifts is 1560 +/- 80 km s(-1). They show a velocity dispersion of 380 +/- 50 km s(-1). Large-scale structure can be traced behind the cluster to a redshift beyond z = 0.3. Background compact galaxies and low surface brightness galaxies are found to follow the general galaxy distribution.
Resumo:
Introduction: Responses to external stimuli are typically investigated by averaging peri-stimulus electroencephalography (EEG) epochs in order to derive event-related potentials (ERPs) across the electrode montage, under the assumption that signals that are related to the external stimulus are fixed in time across trials. We demonstrate the applicability of a single-trial model based on patterns of scalp topographies (De Lucia et al, 2007) that can be used for ERP analysis at the single-subject level. The model is able to classify new trials (or groups of trials) with minimal a priori hypotheses, using information derived from a training dataset. The features used for the classification (the topography of responses and their latency) can be neurophysiologically interpreted, because a difference in scalp topography indicates a different configuration of brain generators. An above chance classification accuracy on test datasets implicitly demonstrates the suitability of this model for EEG data. Methods: The data analyzed in this study were acquired from two separate visual evoked potential (VEP) experiments. The first entailed passive presentation of checkerboard stimuli to each of the four visual quadrants (hereafter, "Checkerboard Experiment") (Plomp et al, submitted). The second entailed active discrimination of novel versus repeated line drawings of common objects (hereafter, "Priming Experiment") (Murray et al, 2004). Four subjects per experiment were analyzed, using approx. 200 trials per experimental condition. These trials were randomly separated in training (90%) and testing (10%) datasets in 10 independent shuffles. In order to perform the ERP analysis we estimated the statistical distribution of voltage topographies by a Mixture of Gaussians (MofGs), which reduces our original dataset to a small number of representative voltage topographies. We then evaluated statistically the degree of presence of these template maps across trials and whether and when this was different across experimental conditions. Based on these differences, single-trials or sets of a few single-trials were classified as belonging to one or the other experimental condition. Classification performance was assessed using the Receiver Operating Characteristic (ROC) curve. Results: For the Checkerboard Experiment contrasts entailed left vs. right visual field presentations for upper and lower quadrants, separately. The average posterior probabilities, indicating the presence of the computed template maps in time and across trials revealed significant differences starting at ~60-70 ms post-stimulus. The average ROC curve area across all four subjects was 0.80 and 0.85 for upper and lower quadrants, respectively and was in all cases significantly higher than chance (unpaired t-test, p<0.0001). In the Priming Experiment, we contrasted initial versus repeated presentations of visual object stimuli. Their posterior probabilities revealed significant differences, which started at 250ms post-stimulus onset. The classification accuracy rates with single-trial test data were at chance level. We therefore considered sub-averages based on five single trials. We found that for three out of four subjects' classification rates were significantly above chance level (unpaired t-test, p<0.0001). Conclusions: The main advantage of the present approach is that it is based on topographic features that are readily interpretable along neurophysiologic lines. As these maps were previously normalized by the overall strength of the field potential on the scalp, a change in their presence across trials and between conditions forcibly reflects a change in the underlying generator configurations. The temporal periods of statistical difference between conditions were estimated for each training dataset for ten shuffles of the data. Across the ten shuffles and in both experiments, we observed a high level of consistency in the temporal periods over which the two conditions differed. With this method we are able to analyze ERPs at the single-subject level providing a novel tool to compare normal electrophysiological responses versus single cases that cannot be considered part of any cohort of subjects. This aspect promises to have a strong impact on both basic and clinical research.
Resumo:
Multisensory experiences influence subsequent memory performance and brain responses. Studies have thus far concentrated on semantically congruent pairings, leaving unresolved the influence of stimulus pairing and memory sub-types. Here, we paired images with unique, meaningless sounds during a continuous recognition task to determine if purely episodic, single-trial multisensory experiences can incidentally impact subsequent visual object discrimination. Psychophysics and electrical neuroimaging analyses of visual evoked potentials (VEPs) compared responses to repeated images either paired or not with a meaningless sound during initial encounters. Recognition accuracy was significantly impaired for images initially presented as multisensory pairs and could not be explained in terms of differential attention or transfer of effects from encoding to retrieval. VEP modulations occurred at 100-130ms and 270-310ms and stemmed from topographic differences indicative of network configuration changes within the brain. Distributed source estimations localized the earlier effect to regions of the right posterior temporal gyrus (STG) and the later effect to regions of the middle temporal gyrus (MTG). Responses in these regions were stronger for images previously encountered as multisensory pairs. Only the later effect correlated with performance such that greater MTG activity in response to repeated visual stimuli was linked with greater performance decrements. The present findings suggest that brain networks involved in this discrimination may critically depend on whether multisensory events facilitate or impair later visual memory performance. More generally, the data support models whereby effects of multisensory interactions persist to incidentally affect subsequent behavior as well as visual processing during its initial stages.
Resumo:
Multisensory experiences enhance perceptions and facilitate memory retrieval processes, even when only unisensory information is available for accessing such memories. Using fMRI, we identified human brain regions involved in discriminating visual stimuli according to past multisensory vs. unisensory experiences. Subjects performed a completely orthogonal task, discriminating repeated from initial image presentations intermixed within a continuous recognition task. Half of initial presentations were multisensory, and all repetitions were exclusively visual. Despite only single-trial exposures to initial image presentations, accuracy in indicating image repetitions was significantly improved by past auditory-visual multisensory experiences over images only encountered visually. Similarly, regions within the lateral-occipital complex-areas typically associated with visual object recognition processes-were more active to visual stimuli with multisensory than unisensory pasts. Additional differential responses were observed in the anterior cingulate and frontal cortices. Multisensory experiences are registered by the brain even when of no immediate behavioral relevance and can be used to categorize memories. These data reveal the functional efficacy of multisensory processing.
Resumo:
Under the referential of a ternary logic, this article aims to focus on the geographical map perceived as a visual object and the question of surfaces of representation. We analyse the status of map making versus landscape representation, the relations between a map and a painted picture. A ternary model of the pictural composition perspective|light/pictural field is proposed. The frame of the map, articulating the space that is cut out and the space included is discussed in a parallel between maps and painted pictures.
Resumo:
We perceive our environment through multiple sensory channels. Nonetheless, research has traditionally focused on the investigation of sensory processing within single modalities. Thus, investigating how our brain integrates multisensory information is of crucial importance for understanding how organisms cope with a constantly changing and dynamic environment. During my thesis I have investigated how multisensory events impact our perception and brain responses, either when auditory-visual stimuli were presented simultaneously or how multisensory events at one point in time impact later unisensory processing. In "Looming signals reveal synergistic principles of multisensory integration" (Cappe, Thelen et al., 2012) we investigated the neuronal substrates involved in motion detection in depth under multisensory vs. unisensory conditions. We have shown that congruent auditory-visual looming (i.e. approaching) signals are preferentially integrated by the brain. Further, we show that early effects under these conditions are relevant for behavior, effectively speeding up responses to these combined stimulus presentations. In "Electrical neuroimaging of memory discrimination based on single-trial multisensory learning" (Thelen et al., 2012), we investigated the behavioral impact of single encounters with meaningless auditory-visual object parings upon subsequent visual object recognition. In addition to showing that these encounters lead to impaired recognition accuracy upon repeated visual presentations, we have shown that the brain discriminates images as soon as ~100ms post-stimulus onset according to the initial encounter context. In "Single-trial multisensory memories affect later visual and auditory object recognition" (Thelen et al., in review) we have addressed whether auditory object recognition is affected by single-trial multisensory memories, and whether recognition accuracy of sounds was similarly affected by the initial encounter context as visual objects. We found that this is in fact the case. We propose that a common underlying brain network is differentially involved during encoding and retrieval of images and sounds based on our behavioral findings. - Nous percevons l'environnement qui nous entoure à l'aide de plusieurs organes sensoriels. Antérieurement, la recherche sur la perception s'est focalisée sur l'étude des systèmes sensoriels indépendamment les uns des autres. Cependant, l'étude des processus cérébraux qui soutiennent l'intégration de l'information multisensorielle est d'une importance cruciale pour comprendre comment notre cerveau travail en réponse à un monde dynamique en perpétuel changement. Pendant ma thèse, j'ai ainsi étudié comment des événements multisensoriels impactent notre perception immédiate et/ou ultérieure et comment ils sont traités par notre cerveau. Dans l'étude " Looming signals reveal synergistic principles of multisensory integration" (Cappe, Thelen et al., 2012), nous nous sommes intéressés aux processus neuronaux impliqués dans la détection de mouvements à l'aide de l'utilisation de stimuli audio-visuels seuls ou combinés. Nos résultats ont montré que notre cerveau intègre de manière préférentielle des stimuli audio-visuels combinés s'approchant de l'observateur. De plus, nous avons montré que des effets précoces, observés au niveau de la réponse cérébrale, influencent notre comportement, en accélérant la détection de ces stimuli. Dans l'étude "Electrical neuroimaging of memory discrimination based on single-trial multisensory learning" (Thelen et al., 2012), nous nous sommes intéressés à l'impact qu'a la présentation d'un stimulus audio-visuel sur l'exactitude de reconnaissance d'une image. Nous avons étudié comment la présentation d'une combinaison audio-visuelle sans signification, impacte, au niveau comportementale et cérébral, sur la reconnaissance ultérieure de l'image. Les résultats ont montré que l'exactitude de la reconnaissance d'images, présentées dans le passé, avec un son sans signification, est inférieure à celle obtenue dans le cas d'images présentées seules. De plus, notre cerveau différencie ces deux types de stimuli très tôt dans le traitement d'images. Dans l'étude "Single-trial multisensory memories affect later visual and auditory object recognition" (Thelen et al., in review), nous nous sommes posés la question si l'exactitude de ia reconnaissance de sons était affectée de manière semblable par la présentation d'événements multisensoriels passés. Ceci a été vérifié par nos résultats. Nous avons proposé que cette similitude puisse être expliquée par le recrutement différentiel d'un réseau neuronal commun.
Resumo:
Visual object tracking has been one of the most popular research topics in the field of computer vision recently. Specifically, hand tracking has attracted significant attention since it would enable many useful practical applications. However, hand tracking is still a very challenging problem which cannot be considered solved. The fact that almost every aspect of hand appearance can change is the fundamental reason for this difficulty. This thesis focused on 2D-based hand tracking in high-speed camera videos. During the project, a toolbox for this purpose was collected which contains nine different tracking methods. In the experiments, these methods were tested and compared against each other with both high-speed videos recorded during the project and publicly available normal speed videos. The results revealed that tracking accuracies varied considerably depending on the video and the method. Therefore, no single method was clearly the best in all videos, but three methods, CT, HT, and TLD, performed better than the others overall. Moreover, the results provide insights about the suitability of each method to different types and situations of hand tracking.
Resumo:
Les temps de réponse dans une tache de reconnaissance d’objets visuels diminuent de façon significative lorsque les cibles peuvent être distinguées à partir de deux attributs redondants. Le gain de redondance pour deux attributs est un résultat commun dans la littérature, mais un gain causé par trois attributs redondants n’a été observé que lorsque ces trois attributs venaient de trois modalités différentes (tactile, auditive et visuelle). La présente étude démontre que le gain de redondance pour trois attributs de la même modalité est effectivement possible. Elle inclut aussi une investigation plus détaillée des caractéristiques du gain de redondance. Celles-ci incluent, outre la diminution des temps de réponse, une diminution des temps de réponses minimaux particulièrement et une augmentation de la symétrie de la distribution des temps de réponse. Cette étude présente des indices que ni les modèles de course, ni les modèles de coactivation ne sont en mesure d’expliquer l’ensemble des caractéristiques du gain de redondance. Dans ce contexte, nous introduisons une nouvelle méthode pour évaluer le triple gain de redondance basée sur la performance des cibles doublement redondantes. Le modèle de cascade est présenté afin d’expliquer les résultats de cette étude. Ce modèle comporte plusieurs voies de traitement qui sont déclenchées par une cascade d’activations avant de satisfaire un seul critère de décision. Il offre une approche homogène aux recherches antérieures sur le gain de redondance. L’analyse des caractéristiques des distributions de temps de réponse, soit leur moyenne, leur symétrie, leur décalage ou leur étendue, est un outil essentiel pour cette étude. Il était important de trouver un test statistique capable de refléter les différences au niveau de toutes ces caractéristiques. Nous abordons la problématique d’analyser les temps de réponse sans perte d’information, ainsi que l’insuffisance des méthodes d’analyse communes dans ce contexte, comme grouper les temps de réponses de plusieurs participants (e. g. Vincentizing). Les tests de distributions, le plus connu étant le test de Kolmogorov- Smirnoff, constituent une meilleure alternative pour comparer des distributions, celles des temps de réponse en particulier. Un test encore inconnu en psychologie est introduit : le test d’Anderson-Darling à deux échantillons. Les deux tests sont comparés, et puis nous présentons des indices concluants démontrant la puissance du test d’Anderson-Darling : en comparant des distributions qui varient seulement au niveau de (1) leur décalage, (2) leur étendue, (3) leur symétrie, ou (4) leurs extrémités, nous pouvons affirmer que le test d’Anderson-Darling reconnait mieux les différences. De plus, le test d’Anderson-Darling a un taux d’erreur de type I qui correspond exactement à l’alpha tandis que le test de Kolmogorov-Smirnoff est trop conservateur. En conséquence, le test d’Anderson-Darling nécessite moins de données pour atteindre une puissance statistique suffisante.
Resumo:
This paper presents a unique two-stage image restoration framework especially for further application of a novel rectangular poor-pixels detector, which, with properties of miniature size, light weight and low power consumption, has great value in the micro vision system. To meet the demand of fast processing, only a few measured images shifted up to subpixel level are needed to join the fusion operation, fewer than those required in traditional approaches. By maximum likelihood estimation with a least squares method, a preliminary restored image is linearly interpolated. After noise removal via Canny operator based level set evolution, the final high-quality restored image is achieved. Experimental results demonstrate effectiveness of the proposed framework. It is a sensible step towards subsequent image understanding and object identification.