964 resultados para Visual properties
Resumo:
A method is presented for the visual analysis of objects by computer. It is particularly well suited for opaque objects with smoothly curved surfaces. The method extracts information about the object's surface properties, including measures of its specularity, texture, and regularity. It also aids in determining the object's shape. The application of this method to a simple recognition task ??e recognition of fruit ?? discussed. The results on a more complex smoothly curved object, a human face, are also considered.
Resumo:
My thesis studies how people pay attention to other people and the environment. How does the brain figure out what is important and what are the neural mechanisms underlying attention? What is special about salient social cues compared to salient non-social cues? In Chapter I, I review social cues that attract attention, with an emphasis on the neurobiology of these social cues. I also review neurological and psychiatric links: the relationship between saliency, the amygdala and autism. The first empirical chapter then begins by noting that people constantly move in the environment. In Chapter II, I study the spatial cues that attract attention during locomotion using a cued speeded discrimination task. I found that when the motion was expansive, attention was attracted towards the singular point of the optic flow (the focus of expansion, FOE) in a sustained fashion. The more ecologically valid the motion features became (e.g., temporal expansion of each object, spatial depth structure implied by distribution of the size of the objects), the stronger the attentional effects. However, compared to inanimate objects and cues, people preferentially attend to animals and faces, a process in which the amygdala is thought to play an important role. To directly compare social cues and non-social cues in the same experiment and investigate the neural structures processing social cues, in Chapter III, I employ a change detection task and test four rare patients with bilateral amygdala lesions. All four amygdala patients showed a normal pattern of reliably faster and more accurate detection of animate stimuli, suggesting that advantageous processing of social cues can be preserved even without the amygdala, a key structure of the “social brain”. People not only attend to faces, but also pay attention to others’ facial emotions and analyze faces in great detail. Humans have a dedicated system for processing faces and the amygdala has long been associated with a key role in recognizing facial emotions. In Chapter IV, I study the neural mechanisms of emotion perception and find that single neurons in the human amygdala are selective for subjective judgment of others’ emotions. Lastly, people typically pay special attention to faces and people, but people with autism spectrum disorders (ASD) might not. To further study social attention and explore possible deficits of social attention in autism, in Chapter V, I employ a visual search task and show that people with ASD have reduced attention, especially social attention, to target-congruent objects in the search array. This deficit cannot be explained by low-level visual properties of the stimuli and is independent of the amygdala, but it is dependent on task demands. Overall, through visual psychophysics with concurrent eye-tracking, my thesis found and analyzed socially salient cues and compared social vs. non-social cues and healthy vs. clinical populations. Neural mechanisms underlying social saliency were elucidated through electrophysiology and lesion studies. I finally propose further research questions based on the findings in my thesis and introduce my follow-up studies and preliminary results beyond the scope of this thesis in the very last section, Future Directions.
Resumo:
Previous studies have found that the lateral posterior fusiform gyri respond more robustly to pictures of animals than pictures of manmade objects and suggested that these regions encode the visual properties characteristic of animals. We suggest that such effects actually reflect processing demands arising when items with similar representations must be finely discriminated. In a positron emission tomography (PET) study of category verification with colored photographs of animals and vehicles, there was robust animal-specific activation in the lateral posterior fusiform gyri when stimuli were categorized at an intermediate level of specificity (e.g., dog or car). However, when the same photographs were categorized at a more specific level (e.g., Labrador or BMW), these regions responded equally strongly to animals and vehicles. We conclude that the lateral posterior fusiform does not encode domain-specific representations of animals or visual properties characteristic of animals. Instead, these regions are strongly activated whenever an item must be discriminated from many close visual or semantic competitors. Apparent category effects arise because, at an intermediate level of specificity, animals have more visual and semantic competitors than do artifacts.
Resumo:
Monkeys have strong abilities to remember the visual properties of potential food sources for survival in the nature. The present study demonstrated the first observations of rhesus monkeys learning to solve complex spatial mazes in which routes were guid
Resumo:
Threat-relevant stimuli such as fear faces are prioritized by the human visual system. Recent research suggests that this prioritization begins during unconscious processing: A specialized (possibly subcortical) pathway evaluates the threat relevance of visual input, resulting in preferential access to awareness for threat stimuli. Our data challenge this claim. We used a continuous flash suppression (CFS) paradigm to present emotional face stimuli outside of awareness. It has been shown using CFS that salient (e.g., high contrast) and recognizable stimuli (faces, words) become visible more quickly than less salient or less recognizable stimuli. We found that although fearful faces emerge from suppression faster than other faces, this was wholly explained by their low-level visual properties, rather than their emotional content. We conclude that, in the competition for visual awareness, the visual system prefers and promotes unconscious stimuli that are more “face-like,” but the emotional content of a face has no effect on stimulus salience.
Resumo:
The efficiency in image classification tasks can be improved using combined information provided by several sources, such as shape, color, and texture visual properties. Although many works proposed to combine different feature vectors, we model the descriptor combination as an optimization problem to be addressed by evolutionary-based techniques, which compute distances between samples that maximize their separability in the feature space. The robustness of the proposed technique is assessed by the Optimum-Path Forest classifier. Experiments showed that the proposed methodology can outperform individual information provided by single descriptors in well-known public datasets. © 2012 IEEE.
Resumo:
Huge image collections are becoming available lately. In this scenario, the use of Content-Based Image Retrieval (CBIR) systems has emerged as a promising approach to support image searches. The objective of CBIR systems is to retrieve the most similar images in a collection, given a query image, by taking into account image visual properties such as texture, color, and shape. In these systems, the effectiveness of the retrieval process depends heavily on the accuracy of ranking approaches. Recently, re-ranking approaches have been proposed to improve the effectiveness of CBIR systems by taking into account the relationships among images. The re-ranking approaches consider the relationships among all images in a given dataset. These approaches typically demands a huge amount of computational power, which hampers its use in practical situations. On the other hand, these methods can be massively parallelized. In this paper, we propose to speedup the computation of the RL-Sim algorithm, a recently proposed image re-ranking approach, by using the computational power of Graphics Processing Units (GPU). GPUs are emerging as relatively inexpensive parallel processors that are becoming available on a wide range of computer systems. We address the image re-ranking performance challenges by proposing a parallel solution designed to fit the computational model of GPUs. We conducted an experimental evaluation considering different implementations and devices. Experimental results demonstrate that significant performance gains can be obtained. Our approach achieves speedups of 7x from serial implementation considering the overall algorithm and up to 36x on its core steps.
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
Die BBC-Serie SHERLOCK war 2011 eine der meistexportierten Fernsehproduktionen Großbritanniens und wurde weltweit in viele Sprachen übersetzt. Eine der Herausforderungen bei der Übersetzung stellen die Schrifteinblendungen der Serie (kurz: Inserts) dar. Die Inserts versprachlichen die Gedanken des Protagonisten, bilden schriftliche und digitale Kommunikation ab und zeichnen sich dabei durch ihre visuelle Auffälligkeit und teilweise als einzige Träger sprachlicher Kommunikation aus, womit sie zum wichtigen ästhetischen und narrativen Mittel in der Serie werden. Interessanterweise sind in der Übersetztung alle stilistischen Eigenschaften der Original-Inserts erhalten. In dieser Arbeit wird einerseits untersucht, wie Schrifteinblendungen im Film theoretisch beschrieben werden können, und andererseits, was sie in der Praxis so übersetzt werden können, wie es in der deutschen Version von Sherlock geschah. Zur theoretischen Beschreibung werden zunächst die Schrifteinblendungen in Sherlock Untertitelungsnormen anhand relevanter grundlegender semiotischer Dimensionen gegenübergestellt. Weiterhin wird das Verhältnis zwischen Schrifteinblendungen und Filmbild erkundet. Dazu wird geprüft, wie gut verschiedene Beschreibungsansätze zu Text-Bild-Verhältnissen aus der Sprachwissenschaft, Comicforschung, Übersetzungswissenschaft und Typografie die Einblendungen in Sherlock erklären können. Im praktischen Teil wird die Übersetzung der Einblendungen beleuchtet. Der Übersetzungsprozess bei der deutschen Version wird auf Grundlage eines Experteninterviews mit dem Synchronautor der Serie rekonstruiert, der auch für die Formulierung der Inserts zuständig war. Abschließend werden spezifische Übersetzungsprobleme der Inserts aus der zweiten Staffel von SHERLOCK diskutiert. Es zeigt sich, dass Untertitelungsnormen zur Beschreibung von Inserts nicht geeignet sind, da sie in Dimensionen wie Position, grafische Gestaltung, Animation, Soundeffekte, aber auch Timing stark eingeschränkt sind. Dies lässt sich durch das historisch geprägte Verständnis von Untertiteln erklären, die als möglichst wenig störendes Beiwerk zum fertigen Filmbild und -ablauf (notgedrungen) hinzugefügt werden, wohingegen für die Inserts in SHERLOCK teilweise sogar ein zentraler Platz in der Bild- und Szenenkomposition bereits bei den Dreharbeiten vorgesehen wurde. In Bezug auf Text-Bild-Verhältnisse zeigen sich die größten Parallelen zu Ansätzen aus der Comicforschung, da auch dort schriftliche Texte im Bild eingebettet sind anstatt andersherum. Allerdings sind auch diese Ansätze zur Beschreibung von Bewegung und Ton unzureichend. Die Erkundung der Erklärungsreichweite weiterer vielversprechender Konzepte, wie Interface und Usability, bleibt ein Ziel für künftige Studien. Aus dem Experteninterview lässt sich schließen, dass die Übersetzung von Inserts ein neues, noch unstandardisiertes Verfahren ist, in dem idiosynkratische praktische Lösungen zur sprachübergreifenden Kommunikation zwischen verschiedenen Prozessbeteiligten zum Einsatz kommen. Bei hochqualitative Produktionen zeigt ist auch für die ersetzende Insertübersetzung der Einsatz von Grafikern unerlässlich, zumindest für die Erstellung neuer Inserts als Übersetzungen von gefilmtem Text (Display). Hierbei sind die theoretisch möglichen Synergien zwischen Sprach- und Bildexperten noch nicht voll ausgeschöpft. Zudem zeigt sich Optimierungspotential mit Blick auf die Bereitstellung von sorgfältiger Dokumentation zur ausgangssprachlichen Version. Diese wäre als Referenzmaterial für die Übersetzung insbesondere auch für Zwecke der internationalen Qualitätssicherung relevant. Die übersetzten Inserts in der deutschen Version weisen insgesamt eine sehr hohe Qualität auf. Übersetzungsprobleme ergeben sich für das genretypische Element der Codes, die wegen ihrer Kompaktheit und multiplen Bezügen zum Film eine Herausforderung darstellen. Neben weiteren bekannten Übersetzungsproblemen wie intertextuellen Bezügen und Realia stellt sich immer wieder die Frage, wieviel der im Original dargestellten Insert- und Displaytexte übersetzt werden müssen. Aus Gründen der visuellen Konsistenz wurden neue Inserts zur Übersetzung von Displays notwendig. Außerdem stellt sich die Frage insbesondere bei Fülltexten. Sie dienen der Repräsentation von Text und der Erweiterung der Grenzen der fiktiv dargestellten Welt, sind allerdings mit hohem Übersetzungsaufwand bei minimaler Bedeutung für die Handlung verbunden.
Classification of Paintings by Artist, Movement, and Indoor Setting Using MPEG-7 Descriptor Features
Resumo:
ACM Computing Classification System (1998): I.4.9, I.4.10.
Resumo:
Data Visualization is widely used to facilitate the comprehension of information and find relationships between data. One of the most widely used techniques for multivariate data (4 or more variables) visualization is the 2D scatterplot. This technique associates each data item to a visual mark in the following way: two variables are mapped to Cartesian coordinates so that a visual mark can be placed on the Cartesian plane; the others variables are mapped gradually to visual properties of the mark, such as size, color, shape, among others. As the number of variables to be visualized increases, the amount of visual properties associated to the mark increases as well. As a result, the complexity of the final visualization is higher. However, increasing the complexity of the visualization does not necessarily implies a better visualization and, sometimes, it provides an inverse situation, producing a visually polluted and confusing visualization—this problem is called visual properties overload. This work aims to investigate whether it is possible to work around the overload of the visual channel and improve insight about multivariate data visualized through a modification in the 2D scatterplot technique. In this modification, we map the variables from data items to multisensoriy marks. These marks are composed not only by visual properties, but haptic properties, such as vibration, viscosity and elastic resistance, as well. We believed that this approach could ease the insight process, through the transposition of properties from the visual channel to the haptic channel. The hypothesis was verified through experiments, in which we have analyzed (a) the accuracy of the answers; (b) response time; and (c) the grade of personal satisfaction with the proposed approach. However, the hypothesis was not validated. The results suggest that there is an equivalence between the investigated visual and haptic properties in all analyzed aspects, though in strictly numeric terms the multisensory visualization achieved better results in response time and personal satisfaction.
Resumo:
The safety of post-earthquake structures is evaluated manually through inspecting the visible damage inflicted on structural elements. This process is time-consuming and costly. In order to automate this type of assessment, several crack detection methods have been created. However, they focus on locating crack points. The next step, retrieving useful properties (e.g. crack width, length, and orientation) from the crack points, has not yet been adequately investigated. This paper presents a novel method of retrieving crack properties. In the method, crack points are first located through state-of-the-art crack detection techniques. Then, the skeleton configurations of the points are identified using image thinning. The configurations are integrated into the distance field of crack points calculated through a distance transform. This way, crack width, length, and orientation can be automatically retrieved. The method was implemented using Microsoft Visual Studio and its effectiveness was tested on real crack images collected from Haiti.