10 resultados para 3D Video Telecommunication Multimedia
em Digital Peer Publishing
Resumo:
This paper presents an empirical study of affine invariant feature detectors to perform matching on video sequences of people with non-rigid surface deformation. Recent advances in feature detection and wide baseline matching have focused on static scenes. Video frames of human movement capture highly non-rigid deformation such as loose hair, cloth creases, skin stretching and free flowing clothing. This study evaluates the performance of six widely used feature detectors for sparse temporal correspondence on single view and multiple view video sequences. Quantitative evaluation is performed of both the number of features detected and their temporal matching against and without ground truth correspondence. Recall-accuracy analysis of feature matching is reported for temporal correspondence on single view and multiple view sequences of people with variation in clothing and movement. This analysis identifies that existing feature detection and matching algorithms are unreliable for fast movement with common clothing.
Resumo:
Visual fixation is employed by humans and some animals to keep a specific 3D location at the center of the visual gaze. Inspired by this phenomenon in nature, this paper explores the idea to transfer this mechanism to the context of video stabilization for a handheld video camera. A novel approach is presented that stabilizes a video by fixating on automatically extracted 3D target points. This approach is different from existing automatic solutions that stabilize the video by smoothing. To determine the 3D target points, the recorded scene is analyzed with a stateof- the-art structure-from-motion algorithm, which estimates camera motion and reconstructs a 3D point cloud of the static scene objects. Special algorithms are presented that search either virtual or real 3D target points, which back-project close to the center of the image for as long a period of time as possible. The stabilization algorithm then transforms the original images of the sequence so that these 3D target points are kept exactly in the center of the image, which, in case of real 3D target points, produces a perfectly stable result at the image center. Furthermore, different methods of additional user interaction are investigated. It is shown that the stabilization process can easily be controlled and that it can be combined with state-of-theart tracking techniques in order to obtain a powerful image stabilization tool. The approach is evaluated on a variety of videos taken with a hand-held camera in natural scenes.
Resumo:
This paper introduces a database of freely available stereo-3D content designed to facilitate research in stereo post-production. It describes the structure and content of the database and provides some details about how the material was gathered. The database includes examples of many of the scenarios characteristic to broadcast footage. Material was gathered at different locations including a studio with controlled lighting and both indoor and outdoor on-location sites with more restricted lighting control. The database also includes video sequences with accompanying 3D audio data recorded in an Ambisonics format. An intended consequence of gathering the material is that the database contains examples of degradations that would be commonly present in real-world scenarios. This paper describes one such artefact caused by uneven exposure in the stereo views, causing saturation in the over-exposed view. An algorithm for the restoration of this artefact is proposed in order to highlight the usefuiness of the database.
Resumo:
The explosion of multimedia digital content and the development of technologies that go beyond traditional broadcast and TV have rendered access to such content important for all end-users of these technologies. While originally developed for providing access to multimedia digital libraries, video search technologies assume now a more demanding role. In this paper, we attempt to shed light onto this new role of video search technologies, looking at the rapid developments in the related market, the lessons learned from state of art video search prototypes developed mainly in the digital libraries context and the new technological challenges that have risen. We focus on one of the latter, i.e., the development of cross-media decision mechanisms, drawing examples from REVEAL THIS, an FP6 project on the retrieval of video and language for the home user. We argue, that efficient video search holds a key to the usability of the new ”pervasive digital video” technologies and that it should involve cross-media decision mechanisms.
Resumo:
While sound and video may capture viewers' attention, interaction can captivate them. This has not been available prior to the advent of Digital Television. In fact, what lies at the heart of the Digital Television revolution is this new type of interactive content, offered in the form of interactive Television (iTV) services. On top of that, the new world of converged networks has created a demand for a new type of converged services on a range of mobile terminals (Tablet PCs, PDAs and mobile phones). This paper aims at presenting a new approach to service creation that allows for the semi-automatic translation of simulations and rapid prototypes created in the accessible desktop multimedia authoring package Macromedia Director into services ready for broadcast. This is achieved by a series of tools that de-skill and speed-up the process of creating digital TV user interfaces (UI) and applications for mobile terminals. The benefits of rapid prototyping are essential for the production of these new types of services, and are therefore discussed in the first section of this paper. In the following sections, an overview of the operation of content, service, creation and management sub-systems is presented, which illustrates why these tools compose an important and integral part of a system responsible of creating, delivering and managing converged broadcast and telecommunications services. The next section examines a number of metadata languages candidates for describing the iTV services user interface and the schema language adopted in this project. A detailed description of the operation of the two tools is provided to offer an insight of how they can be used to de-skill and speed-up the process of creating digital TV user interfaces and applications for mobile terminals. Finally, representative broadcast oriented and telecommunication oriented converged service components are also introduced, demonstrating how these tools have been used to generate different types of services.
Resumo:
Audio-visual documents obtained from German TV news are classified according to the IPTC topic categorization scheme. To this end usual text classification techniques are adapted to speech, video, and non-speech audio. For each of the three modalities word analogues are generated: sequences of syllables for speech, “video words” based on low level color features (color moments, color correlogram and color wavelet), and “audio words” based on low-level spectral features (spectral envelope and spectral flatness) for non-speech audio. Such audio and video words provide a means to represent the different modalities in a uniform way. The frequencies of the word analogues represent audio-visual documents: the standard bag-of-words approach. Support vector machines are used for supervised classification in a 1 vs. n setting. Classification based on speech outperforms all other single modalities. Combining speech with non-speech audio improves classification. Classification is further improved by supplementing speech and non-speech audio with video words. Optimal F-scores range between 62% and 94% corresponding to 50% - 84% above chance. The optimal combination of modalities depends on the category to be recognized. The construction of audio and video words from low-level features provide a good basis for the integration of speech, non-speech audio and video.
Resumo:
Second Life (SL) is an ideal platform for language learning. It is called a Multi-User Virtual Environment, where users can have varieties of learning experiences in life-like environments. Numerous attempts have been made to use SL as a platform for language teaching and the possibility of SL as a means to promote conversational interactions has been reported. However, the research so far has largely focused on simply using SL without further augmentations for communication between learners or between teachers and learners in a school-like environment. Conversely, not enough attention has been paid to its controllability which builds on the embedded functions in SL. This study, based on the latest theories of second language acquisition, especially on the Task Based Language Teaching and the Interaction Hypothesis, proposes to design and implement an automatized interactive task space (AITS) where robotic agents work as interlocutors of learners. This paper presents a design that incorporates the SLA theories into SL and the implementation method of the design to construct AITS, fulfilling the controllability of SL. It also presents the result of the evaluation experiment conducted on the constructed AITS.
Resumo:
Recently, stable markerless 6 DOF video based handtracking devices became available. These devices simultaneously track the positions and orientations of both user hands in different postures with at least 25 frames per second. Such hand-tracking allows for using the human hands as natural input devices. However, the absence of physical buttons for performing click actions and state changes poses severe challenges in designing an efficient and easy to use 3D interface on top of such a device. In particular, for coupling and decoupling a virtual object’s movements to the user’s hand (i.e. grabbing and releasing) a solution has to be found. In this paper, we introduce a novel technique for efficient two-handed grabbing and releasing objects and intuitively manipulating them in the virtual space. This technique is integrated in a novel 3D interface for virtual manipulations. A user experiment shows the superior applicability of this new technique. Last but not least, we describe how this technique can be exploited in practice to improve interaction by integrating it with RTT DeltaGen, a professional CAD/CAS visualization and editing tool.
Resumo:
The long-term preservation of complex works such as video games comes with many challenges. Emulation, currently the most adequate preservation strategy for video games, requires several acts that are technically possible, but closely governed and restricted by copyright law and technical protection measures. Without prior authorisation from the rightsholder(s), it is therefore difficult to legally emulate these works. However, games often have several rightsholders that are in some cases near impossible to identify or locate – particularly with regard to older games. This paper therefore focuses on these so-called orphan video games and examines whether (and to what extent) they are covered by the directive on certain permitted uses of orphan works 2012/28/EU (Orphan Works Directive). As complex works with software and audiovisual components, it is difficult to classify video games in their entirety. The Orphan Works Directive, however, only covers certain categories of works. This paper therefore analyses 1) whether video games in their entirety can be considered types of works that fall under the directive, i.e. audiovisual or cinematographic works, and 2) whether the provisions of the orphan work exception are suitable for the specifics of these complex, “multimedia” works.