56 resultados para layout-automatico testo-a-fronte VDP


Relevância:

10.00% 10.00%

Publicador:

Resumo:

Interpretation of video information is a difficult task for computer vision and machine intelligence. In this paper we examine the utility of a non-image based source of information about video contents, namely the shot list, and study its use in aiding image interpretation. We show how the shot list may be analysed to produce a simple summary of the 'who and where' of a documentary or interview video. In order to detect the subject of a video we use the notion of a 'shot syntax' of a particular genre to isolate actual interview sections.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This work combines natural language understanding and image processing with incremental learning to develop a system that can automatically interpret and index American Football. We have developed a model for representing spatio-temporal characteristics of multiple objects in dynamic scenes in this domain. Our representation combines expert knowledge, domain knowledge, spatial knowledge and temporal knowledge. We also present an incremental learning algorithm to improve the knowledge base as well as to keep previously developed concepts consistent with new data. The advantages of the incremental learning algorithm are that is that it does not split concepts and it generates a compact conceptual hierarchy which does not store instances.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper describes a general purpose flexible technique which uses physical modelling techniques for determining the features of a 3D object that are visible from any predefined view. Physical modelling techniques are used to determine which of many different types of features are visible from a complete set of viewpoints. The power of this technique lies in its ability to detect and parameterise object features, regardless of object complexity. Raytracing is used to simulate the physical process by which object features are visible so that surface properties (eg specularity, transparency) as well as object boundaries can be used in the recognition process. Using this technique occluding and non-occluding edge based features are extracted using image processing techniques and then parameterised. Features caused by specularity are also extracted and qualitative descriptions for these are defined.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper addresses the problem of markerless tracking of a human in full 3D with a high-dimensional (29D) body model Most work in this area has been focused on achieving accurate tracking in order to replace marker-based motion capture, but do so at the cost of relying on relatively clean observing conditions. This paper takes a different perspective, proposing a body-tracking model that is explicitly designed to handle real-world conditions such as occlusions by scene objects, failure recovery, long-term tracking, auto-initialisation, generalisation to different people and integration with action recognition. To achieve these goals, an action's motions are modelled with a variant of the hierarchical hidden Markov model The model is quantitatively evaluated with several tests, including comparison to the annealed particle filter, tracking different people and tracking with a reduced resolution and frame rate.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Traditional methods of object recognition are reliant on shape and so are very difficult to apply in cluttered, wideangle and low-detail views such as surveillance scenes. To address this, a method of indirect object recognition is proposed, where human activity is used to infer both the location and identity of objects. No shape analysis is necessary. The concept is dubbed 'interaction signatures', since the premise is that a human will interact with objects in ways characteristic of the function of that object - for example, a person sits in a chair and drinks from a cup. The human-centred approach means that recognition is possible in low-detail views and is largely invariant to the shape of objects within the same functional class. This paper implements a Bayesian network for classifying region patches with object labels, building upon our previous work in automatically segmenting and recognising a human's interactions with the objects. Experiments show that interaction signatures can successfully find and label objects in low-detail views and are equally effective at recognising test objects that differ markedly in appearance from the training objects.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this paper we present a coherent approach using the hierarchical HMM with shared structures to extract the structural units that form the building blocks of an education/training video. Rather than using hand-crafted approaches to define the structural units, we use the data from nine training videos to learn the parameters of the HHMM, and thus naturally extract the hierarchy. We then study this hierarchy and examine the nature of the structure at different levels of abstraction. Since the observable is continuous, we also show how to extend the parameter learning in the HHMM to deal with continuous observations.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In order to enable high-level semantics-based video annotation and interpretation, we tackle the problem of automatic decomposition of motion pictures into meaningful story units, namely scenes. Since a scene is a complicated and subjective concept, we first propose guidelines from film production to determine when a scene change occurs in film. We examine different rules and conventions followed as part of Film Grammar to guide and shape our algorithmic solution for determining a scene boundary. Two different techniques are proposed as new solutions in this paper. Our experimental results on 10 full-length movies show that our technique based on shot sequence coherence performs well and reasonably better than the color edges-based approach.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

One of the fundamental issues in building autonomous agents is to be able to sense, represent and react to the world. Some of the earlier work [Mor83, Elf90, AyF89] has aimed towards a reconstructionist approach, where a number of sensors are used to obtain input that is used to construct a model of the world that mirrors the real world. Sensing and sensor fusion was thus an important aspect of such work. Such approaches have had limited success, and some of the main problems were the issues of uncertainty arising from sensor error and errors that accumulated in metric, quantitative models. Recent research has therefore looked at different ways of examining the problems. Instead of attempting to get the most accurate and correct model of the world, these approaches look at qualitative models to represent the world, which maintain relative and significant aspects of the environment rather than all aspects of the world. The relevant aspects of the world that are retained are determined by the task at hand which in turn determines how to sense. That is, task directed or purposive sensing is used to build a qualitative model of the world, which though inaccurate and incomplete is sufficient to solve the problem at hand. This paper examines the issues of building up a hierarchical knowledge representation of the environment with limited sensor input that can be actively acquired by an agent capable of interacting with the environment. Different tasks require different aspects of the environment to be abstracted out. For example, low level tasks such as navigation require aspects of the environment that are related to layout and obstacle placement. For the agent to be able to reposition itself in an environment, significant features of spatial situations and their relative placement need to be kept. For the agent to reason about objects in space, for example to determine the position of one object relative to another, the representation needs to retain information on relative locations of start and finish of the objects, that is endpoints of objects on a grid. For the agent to be able to do high level planning, the agent may need only the relative position of the starting point and destination, and not the low level details of endpoints, visual clues and so on. This indicates that a hierarchical approach would be suitable, such that each level in the hierarchy is at a different level of abstraction, and thus suitable for a different task. At the lowest level, the representation contains low level details of agent's motion and visual clues to allow the agent to navigate and reposition itself. At the next level of abstraction the aspects of the representation allow the agent to perform spatial reasoning, and finally the highest level of abstraction in the representation can be used by the agent for high level planning.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

We examine localised sound energy patterns, or events, that we associate with high level affect experienced with films. The study of sound energy events in conjunction with their intended affect enable the analysis of film at a higher conceptual level, such as genre. The various affect/emotional responses we investigate in this paper are brought about by well established patterns of sound energy dynamics employed in audio tracks of horror films. This allows the examination of the thematic content of the films in relation to horror elements. We analyse the frequency of sound energy and affect events at a film level as well as at a scene level, and propose measures indicative of the film genre and scene content. Using 4 horror, and 2 non-horror movies as experimental data we establish a correlation between the sound energy event types and horrific thematic content within film, thus enabling an automated mechanism for genre typing and scene content labeling in film.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this paper, we focus on the ‘reverse editing’ problem in movie analysis, i.e., the extraction of film takes, original camera shots that a film editor extracts and arranges to produce a finished scene. The ability to disassemble final scenes and shots into takes is essential for nonlinear browsing, content annotation and the extraction of higher order cinematic constructs from film. In this work, we investigate agglomerative hierachical clustering methods along with different similarity metrics and group distances for this task, and demonstrate our findings with 10 movies.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper engages with the notion of interspace by examining an understudied and unpublished cycle of mosaics and frescoes destined for the main hall of the Palazzo dei Congressi in Rome’s south-western suburb of EUR, a major building project by Roman architect Adalberto Libera. It first provides a socio-historical and aesthetic background to the building of EUR as Rome’s international exposition of 1942, which aimed to celebrate the achievements of Italian (and fascist) civilisation. It then focuses on the concept of Romanità (or Roman-ness) as a mythical and idealised past that was engaged on a number of levels as a teleological foundation for the advent (and eternity) of fascist rule. This past was adopted, interpreted and made manifest at the urban scale in the master plan of EUR, at the architectural scale in the buildings and at the interior scale in the decorative programs incorporated in each. It argues that the Palazzo dei Congressi allows us to gain further insight into the notion of interspace as it exemplifies this on a number of physical, symbolic and temporal levels. Physically, in the urban space, architectural form and interiors; symbolically, in the content and compositional layout of the mosaics; and temporally, in the use of historical elision and conflation between mythical pasts and idealised present.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The problem of deriving spatial relationships between objects in general requires high lever' abstract representation, and it would pose difficulties even for human observer. Based on a formalism for spatial layouts proposed earlier, we present methods for deducing spatial relations between objects by an active, sighted agent in a large-scale environment. The deduction of spatial relations is based on simple visual clues, and thus this technique is more feasible than schemes that rely on complex object recognition.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Recent approaches to video indexing and retrieval are either pixel-oriented or object-oriented. While the former approaches focus on motion and changes thereto, the latter focus on spatial relations among objects in the scene. In this paper, a spatial knowledge representation technique combining both approaches is proposed. This representation supplements the spatial knowledge of visual objects with information about their pixel positions in the video frame. It provides a practical way to construct video indices, enabling searching for and retrieval of video sequences that contain motion as well as sparsely disjoint objects

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper examines film rhythm, an important expressive element in motion pictures, based on our ongoing study to exploit film grammar as a broad computational framework for the task of automated film and video understanding. Of the many, more or less elusive, narrative devices contributing to film rhythm, this paper discusses motion characteristics that form the basis of our analysis, and presents novel computational models for extracting rhythmic patterns induced through a perception of motion. In our rhythm model, motion behaviour is classified as being either nonexistent, fluid or staccato for a given shot. Shot neighbourhoods in movies are then grouped by proportional makeup of these motion behavioural classes to yield seven high-level rhythmic arrangements that prove to be adept at indicating likely scene content (e.g. dialogue or chase sequence) in our experiments. Underlying causes for this level of codification in our approach are postulated from film grammar, and are accompanied by detailed demonstration from real movies for the purposes of clarification.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The problem of deriving spatial relationships between objects in general requires high level abstract representation, and it would pose difficulties even for human observer. Based on a formalism for spatial layouts proposed earlier [KiV92, VeK921, we present methods for deducing high level spatial relations between objects by an active, sighted agent in a large-scale environment. The deduction of spatial relations is based on simple visual clues, and thus this technique is more feasible than schemes that rely on complex object recognition.