999 resultados para scene representation


Relevância:

70.00% 70.00%

Publicador:

Resumo:

The goal of low-level vision is to estimate an underlying scene, given an observed image. Real-world scenes (e.g., albedos or shapes) can be very complex, conventionally requiring high dimensional representations which are hard to estimate and store. We propose a low-dimensional representation, called a scene recipe, that relies on the image itself to describe the complex scene configurations. Shape recipes are an example: these are the regression coefficients that predict the bandpassed shape from bandpassed image data. We describe the benefits of this representation, and show two uses illustrating their properties: (1) we improve stereo shape estimates by learning shape recipes at low resolution and applying them at full resolution; (2) Shape recipes implicitly contain information about lighting and materials and we use them for material segmentation.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

In an immersive virtual environment, observers fail to notice the expansion of a room around them and consequently make gross errors when comparing the size of objects. This result is difficult to explain if the visual system continuously generates a 3-D model of the scene based on known baseline information from interocular separation or proprioception as the observer walks. An alternative is that observers use view-based methods to guide their actions and to represent the spatial layout of the scene. In this case, they may have an expectation of the images they will receive but be insensitive to the rate at which images arrive as they walk. We describe the way in which the eye movement strategy of animals simplifies motion processing if their goal is to move towards a desired image and discuss dorsal and ventral stream processing of moving images in that context. Although many questions about view-based approaches to scene representation remain unanswered, the solutions are likely to be highly relevant to understanding biological 3-D vision.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Interface has been becoming a more significant element today which influences the development of shopping on-line greatly. But in practice the attention arisen from society and study made are quite inadequate. Under this circumstance, I focus my study on the purpose of improving understanding of the engineering psychological factors, which definitely will play a crucial role in shopping on-line representation in future, and of the relations between them through the following experimental research. I hope it can give a basic reference to the practical application of shopping on-line representation pattern and continuous study. In current thesis, an analysis was made on the basis of engineering psychology principles from three aspects, i.e. person (users), task and information environment. It was considered that system overview and information behavior model would have great impact on the activities of users on the web and that representation pattern of information system would affect the forming of system overview and behavior pattern and then further after the performances of users in information system. Based on above-mentioned statement, a three-dimensional conceptual model was presented which demonstrates the relations between the crucial factors, which are media representation pattern, system hierarchy and objects in information unit. Thereafter, eight study hypothesis, which are about engineering psychological factors of virtual reality (VR) representation in shopping on-line system, was taken out and four experiments were followed up to testify the hypothesis. -In experiment one, a research was made to study how the three kinds of single media representation pattern influence the forming of system overview and information behavior from the point view of task performance, operating error, overall satisfactory and mental workload etc. -In experiment two, a study of how the combined media representation pattern of system hierarchy influences users' behavior was carried out. -In experiment three, a study of the hierarchy structure feature of VR representation pattern and the tendency of its width and depth to the effects of system behavior was made. -In experiment four, a study of the location relations between different parties in VR scene (information unit) was made. The result is as follows: -During structure dimensional state: Width-increasing caused more damage to the speed of users than depth-increasing in VR representation pattern. Although the performance of subjects was quite slow in wider environment, yet the percentage rate of causing errors was in lowest level. -During hierarchy representation pattern: 1. Between the representation patterns of the three media, no significant differences was found in terms of the speed of fulfilling the task, error rate, satisfactory, mental workload etc. But the pattern with figure- aided gained the worst results on all of these aspects. 2. During primary stage of the task and the first level of the hierarchy, the speed of subjects' performance in VR pattern was slower than that in text pattern. While with developing of the task and going deeper level of the hierarchy, the speed of users' performance in VR representation pattern reached to the highest level. 3. Effects in VR representation pattern was better than that in text pattern in higher level of the system. The representation pattern in highest level has greatest impact on the performance of the system behavior, whereas results of the only VR representation in the middle part of hierarchy would be worst. 4. Activity error in single media representation pattern was more than that in combined media representation pattern. 5. Individual differences among subjects had effects on the representation pattern of the system. During VR environment, behavior tendency of party A had a significant negative correlation to the quantities of errors. -In VR-scene representation: Physical-distance and flash influenced the subjects' task performance greatly, while psychological-distance has no outstanding impact. Subjects' accurate rate of performing increased if objects with same relation were in the same structure position, in the state of close psychological-distance or if the object target flashed (not reliable). Although the article limits the topic only on the present-existing questions and analysis of shopping-on-line, as a matter of fact, it can also apply for other relevant purposes on the web. While the study of this article only gives its emphasis on the researching-task with definite goal, making no consideration of other task conditions and their relations with other navigation tools. So I hope it lay a good start to make continuous research in this areas.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In low-level vision, the representation of scene properties such as shape, albedo, etc., are very high dimensional as they have to describe complicated structures. The approach proposed here is to let the image itself bear as much of the representational burden as possible. In many situations, scene and image are closely related and it is possible to find a functional relationship between them. The scene information can be represented in reference to the image where the functional specifies how to translate the image into the associated scene. We illustrate the use of this representation for encoding shape information. We show how this representation has appealing properties such as locality and slow variation across space and scale. These properties provide a way of improving shape estimates coming from other sources of information like stereo.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Radio advertising is suffering from a remarkable crisis of creativity as it has yet not found its role in a radio model based on voice locution and information genres. This article suggests the need for implementing a peripheral or heuristic strategy to attract and hold listeners’ attention. Within this framework, the narration and scene representation are proposed as suitable persuasion techniques. The objective is to design a useful conceptual tool for an efficient creative conception of narration at the service of certain commercial strategy. First, the concept of narrative persuasion is grounded according to the possibilities of the sound code. Second, the keys of scene representation and commercial strategy (brand, product, advantage, benefit and target) within the sound message are presented. And third, these keys are articulated in a model. This model is pre-tested by means of analyzing eight different case-radio ads.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Set in the borderlands between Letterkenny and Derry-Londonderry, a landscape scarred by geological fold, river and cartographer’s pen, the Ulster crime novelist Brian McGilloway chronicles the hopes and fears of a contemporary society unable to escape a complicated history, redolent and entwined with the voices of its ‘ghosts of its past.’ Through his choice of chief protagonist, An Garda Síochána officer Benedict Devlin, McGilloway turns detective to critically investigate the both the seemingly straightforward and the unseen dwelling in the rural Ulster landscape. Following in the footsteps of Nordic and Tartan Noir in making commentary on current societ,y McGilloway recognises the importance of the past in trying to reach an understanding of the present. His critique however goes beyond criminal behaviour motivated primarily by politics or religion, allowing a deeper and more meaningful diagnosis of the ‘state of the nation’. Place, name and event become especially important in contextualising the liminality of McGilloway’s real rural border settings. In doing so, McGilloway continues in the rich tradition of Ulster poet such as Heaney, MacNiece, Muldoon and Hewitt in trying to rationalise the man-made amidst the elemental in the land of both the ‘Planter & The Gael.’ History, language, tradition and the sacral are all instruments of investigation in helping McGilloway present a revealing pathology and atlas of our times to his readers. Turning literary investigator, the author contends that there is much to learn from this physiography, not just for the borderlands region, but for the wider countryside and society beyond. Keywords Cultural Atlas, Crime Fiction, Place, Poetry, Rural.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

O objeto principal desta tese é o estudo de algoritmos de processamento e representação automáticos de dados, em particular de informação obtida por sensores montados a bordo de veículos (2D e 3D), com aplicação em contexto de sistemas de apoio à condução. O trabalho foca alguns dos problemas que, quer os sistemas de condução automática (AD), quer os sistemas avançados de apoio à condução (ADAS), enfrentam hoje em dia. O documento é composto por duas partes. A primeira descreve o projeto, construção e desenvolvimento de três protótipos robóticos, incluindo pormenores associados aos sensores montados a bordo dos robôs, algoritmos e arquitecturas de software. Estes robôs foram utilizados como plataformas de ensaios para testar e validar as técnicas propostas. Para além disso, participaram em várias competições de condução autónoma tendo obtido muito bons resultados. A segunda parte deste documento apresenta vários algoritmos empregues na geração de representações intermédias de dados sensoriais. Estes podem ser utilizados para melhorar técnicas já existentes de reconhecimento de padrões, deteção ou navegação, e por este meio contribuir para futuras aplicações no âmbito dos AD ou ADAS. Dado que os veículos autónomos contêm uma grande quantidade de sensores de diferentes naturezas, representações intermédias são particularmente adequadas, pois podem lidar com problemas relacionados com as diversas naturezas dos dados (2D, 3D, fotométrica, etc.), com o carácter assíncrono dos dados (multiplos sensores a enviar dados a diferentes frequências), ou com o alinhamento dos dados (problemas de calibração, diferentes sensores a disponibilizar diferentes medições para um mesmo objeto). Neste âmbito, são propostas novas técnicas para a computação de uma representação multi-câmara multi-modal de transformação de perspectiva inversa, para a execução de correcção de côr entre imagens de forma a obter mosaicos de qualidade, ou para a geração de uma representação de cena baseada em primitivas poligonais, capaz de lidar com grandes quantidades de dados 3D e 2D, tendo inclusivamente a capacidade de refinar a representação à medida que novos dados sensoriais são recebidos.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We propose a scheme for indoor place identification based on the recognition of global scene views. Scene views are encoded using a holistic representation that provides low-resolution spatial and spectral information. The holistic nature of the representation dispenses with the need to rely on specific objects or local landmarks and also renders it robust against variations in object configurations. We demonstrate the scheme on the problem of recognizing scenes in video sequences captured while walking through an office environment. We develop a method for distinguishing between 'diagnostic' and 'generic' views and also evaluate changes in system performances as a function of the amount of training data available and the complexity of the representation.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Pós-graduação em História - FCLAS

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The ability to detect unusual events in surviellance footage as they happen is a highly desireable feature for a surveillance system. However, this problem remains challenging in crowded scenes due to occlusions and the clustering of people. In this paper, we propose using the Distributed Behavior Model (DBM), which has been widely used in computer graphics, for video event detection. Our approach does not rely on object tracking, and is robust to camera movements. We use sparse coding for classification, and test our approach on various datasets. Our proposed approach outperforms a state-of-the-art work which uses the social force model and Latent Dirichlet Allocation.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Efficient and effective feature detection and representation is an important consideration when processing videos, and a large number of applications such as motion analysis, 3D scene understanding, tracking etc. depend on this. Amongst several feature description methods, local features are becoming increasingly popular for representing videos because of their simplicity and efficiency. While they achieve state-of-the-art performance with low computational complexity, their performance is still too limited for real world applications. Furthermore, rapid increases in the uptake of mobile devices has increased the demand for algorithms that can run with reduced memory and computational requirements. In this paper we propose a semi binary based feature detectordescriptor based on the BRISK detector, which can detect and represent videos with significantly reduced computational requirements, while achieving comparable performance to the state of the art spatio-temporal feature descriptors. First, the BRISK feature detector is applied on a frame by frame basis to detect interest points, then the detected key points are compared against consecutive frames for significant motion. Key points with significant motion are encoded with the BRISK descriptor in the spatial domain and Motion Boundary Histogram in the temporal domain. This descriptor is not only lightweight but also has lower memory requirements because of the binary nature of the BRISK descriptor, allowing the possibility of applications using hand held devices.We evaluate the combination of detectordescriptor performance in the context of action classification with a standard, popular bag-of-features with SVM framework. Experiments are carried out on two popular datasets with varying complexity and we demonstrate comparable performance with other descriptors with reduced computational complexity.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper presents an effective feature representation method in the context of activity recognition. Efficient and effective feature representation plays a crucial role not only in activity recognition, but also in a wide range of applications such as motion analysis, tracking, 3D scene understanding etc. In the context of activity recognition, local features are increasingly popular for representing videos because of their simplicity and efficiency. While they achieve state-of-the-art performance with low computational requirements, their performance is still limited for real world applications due to a lack of contextual information and models not being tailored to specific activities. We propose a new activity representation framework to address the shortcomings of the popular, but simple bag-of-words approach. In our framework, first multiple instance SVM (mi-SVM) is used to identify positive features for each action category and the k-means algorithm is used to generate a codebook. Then locality-constrained linear coding is used to encode the features into the generated codebook, followed by spatio-temporal pyramid pooling to convey the spatio-temporal statistics. Finally, an SVM is used to classify the videos. Experiments carried out on two popular datasets with varying complexity demonstrate significant performance improvement over the base-line bag-of-feature method.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Deep convolutional neural networks (DCNNs) have been employed in many computer vision tasks with great success due to their robustness in feature learning. One of the advantages of DCNNs is their representation robustness to object locations, which is useful for object recognition tasks. However, this also discards spatial information, which is useful when dealing with topological information of the image (e.g. scene labeling, face recognition). In this paper, we propose a deeper and wider network architecture to tackle the scene labeling task. The depth is achieved by incorporating predictions from multiple early layers of the DCNN. The width is achieved by combining multiple outputs of the network. We then further refine the parsing task by adopting graphical models (GMs) as a post-processing step to incorporate spatial and contextual information into the network. The new strategy for a deeper, wider convolutional network coupled with graphical models has shown promising results on the PASCAL-Context dataset.