813 resultados para VIDEO
Resumo:
Clustering identities in a video is a useful task to aid in video search, annotation and retrieval, and cast identification. However, reliably clustering faces across multiple videos is challenging task due to variations in the appearance of the faces, as videos are captured in an uncontrolled environment. A person's appearance may vary due to session variations including: lighting and background changes, occlusions, changes in expression and make up. In this paper we propose the novel Local Total Variability Modelling (Local TVM) approach to cluster faces across a news video corpus; and incorporate this into a novel two stage video clustering system. We first cluster faces within a single video using colour, spatial and temporal cues; after which we use face track modelling and hierarchical agglomerative clustering to cluster faces across the entire corpus. We compare different face recognition approaches within this framework. Experiments on a news video database show that the Local TVM technique is able effectively model the session variation observed in the data, resulting in improved clustering performance, with much greater computational efficiency than other methods.
Resumo:
This paper investigates the challenges of delivering parent training intervention for autism over video. We conducted a qualitative field study of an intervention, which is based on a well-established training program for parents of children with autism, called Hanen More Than Words. The study was conducted with a Hanen Certified speech pathologist who delivered video based training to two mothers, each with a son having autism. We conducted observations of 14 sessions of the intervention spanning 3 months along with 3 semi-structured interviews with each participant. We identified different activities that participants performed across different sessions and analysed them based upon their implications on technology. We found that all the participants welcomed video based training but they also faced several difficulties, particularly in establishing rapport with other participants, inviting equal participation, and in observing and providing feedback on parent-child interactions. Finally, we reflect on our findings and motivate further investigations by defining three design sensitivities of Adaptation, Group Participation, and Physical Setup.
Resumo:
In this paper we report the results of a study comparing implicit-only and explicit-only interactions in a collaborative, video-mediated task with shared content. Expanding on earlier work which has typically only evaluated how implicit interaction can augment primarily explicit systems, we report issues surrounding control, anxiousness and negotiation in the context of video mediated collaboration. We conclude that implicit interaction has the potential to improve collaborative work, but that there are a multitude of issues that must first be negotiated.
Resumo:
This PhD research has proposed new machine learning techniques to improve human action recognition based on local features. Several novel video representation and classification techniques have been proposed to increase the performance with lower computational complexity. The major contributions are the construction of new feature representation techniques, based on advanced machine learning techniques such as multiple instance dictionary learning, Latent Dirichlet Allocation (LDA) and Sparse coding. A Binary-tree based classification technique was also proposed to deal with large amounts of action categories. These techniques are not only improving the classification accuracy with constrained computational resources but are also robust to challenging environmental conditions. These developed techniques can be easily extended to a wide range of video applications to provide near real-time performance.
Resumo:
This paper examines incorporating video-stimulated recall (VSR) as a data collection technique in cross-cultural research. With VSR, participants are invited to watch video-recordings of particular events that they are involved in; they then recall their thoughts in relation to their observations of their behaviour in relation the event. The research draws on a larger PhD project completed at an Australian university that explored Vietnamese lecturers’ beliefs about learner autonomy. In cross-cultural research using the VSR technique provided significant challenges including time constraints of participants, misunderstandings of the VSR protocol and the possibility of participants’ losing face when reflecting on their teaching episodes. Adaptations to the VSR technique were required to meet the cultural challenges specific to this population, indicating a need for flexibility and awareness of the cultural context for research.
Resumo:
Many forms of formative feedback are used in dance training to refine the dancer’s spatial and kinaesthetic awareness in order that the dancer’s sensorimotor intentions and observable danced outcomes might converge. This paper documents the use of smartphones to record and playback movement sequences in ballet and contemporary technique classes. Peers in pairs took turns filming one another and then analysing the playback. This provided immediate visual feedback of the movement sequence as performed by each dancer. This immediacy facilitated the dancer’s capacity to associate what they felt as they were dancing with what they looked like during the dance. The often-dissonant realities of self-perception and perception by others were thus guided towards harmony, generating improved performance and knowledge relating to dance technique. An approach is offered for potential development of peer review activities to support summative progressive assessment in dance technique training.
Resumo:
Scalable video coding (SVC) is an emerging standard built on the success of advanced video coding standard (H.264/AVC) by the Joint video team (JVT). Motion compensated temporal filtering (MCTF) and Closed loop hierarchical B pictures (CHBP) are two important coding methods proposed during initial stages of standardization. Either of the coding methods, MCTF/CHBP performs better depending upon noise content and characteristics of the sequence. This work identifies other characteristics of the sequences for which performance of MCTF is superior to that of CHBP and presents a method to adaptively select either of MCTF and CHBP coding methods at the GOP level. This method, referred as "Adaptive Decomposition" is shown to provide better R-D performance than of that by using MCTF or CRBP only. Further this method is extended to non-scalable coders.
Resumo:
This paper explores the obstacles associated with designing video game levels for the purpose of objectively measuring flow. We sought to create three video game levels capable of inducing a flow state, an overload state (low-flow), and a boredom state (low-flow). A pilot study, in which participants self-reported levels of flow after playing all three game levels, was undertaken. Unexpected results point to the challenges of operationalising flow in video game research, obstacles in experimental design for invoking flow and low-flow, concerns about flow as a construct for measuring video game enjoyment, the applicability of self-report flow scales, and the experience of flow in video game play despite substantial challenge-skill differences.
Resumo:
Feature track matrix factorization based methods have been attractive solutions to the Structure-front-motion (Sfnl) problem. Group motion of the feature points is analyzed to get the 3D information. It is well known that the factorization formulations give rise to rank deficient system of equations. Even when enough constraints exist, the extracted models are sparse due the unavailability of pixel level tracks. Pixel level tracking of 3D surfaces is a difficult problem, particularly when the surface has very little texture as in a human face. Only sparsely located feature points can be tracked and tracking error arc inevitable along rotating lose texture surfaces. However, the 3D models of an object class lie in a subspace of the set of all possible 3D models. We propose a novel solution to the Structure-from-motion problem which utilizes the high-resolution 3D obtained from range scanner to compute a basis for this desired subspace. Adding subspace constraints during factorization also facilitates removal of tracking noise which causes distortions outside the subspace. We demonstrate the effectiveness of our formulation by extracting dense 3D structure of a human face and comparing it with a well known Structure-front-motion algorithm due to Brand.
Resumo:
Large external memory bandwidth requirement leads to increased system power dissipation and cost in video coding application. Majority of the external memory traffic in video encoder is due to reference data accesses. We describe a lossy reference frame compression technique that can be used in video coding with minimal impact on quality while significantly reducing power and bandwidth requirement. The low cost transformless compression technique uses lossy reference for motion estimation to reduce memory traffic, and lossless reference for motion compensation (MC) to avoid drift. Thus, it is compatible with all existing video standards. We calculate the quantization error bound and show that by storing quantization error separately, bandwidth overhead due to MC can be reduced significantly. The technique meets key requirements specific to the video encode application. 24-39% reduction in peak bandwidth and 23-31% reduction in total average power consumption are observed for IBBP sequences.
Resumo:
In this paper, we show that it is possible to reduce the complexity of Intra MB coding in H.264/AVC based on a novel chance constrained classifier. Using the pairs of simple mean-variances values, our technique is able to reduce the complexity of Intra MB coding process with a negligible loss in PSNR. We present an alternate approach to address the classification problem which is equivalent to machine learning. Implementation results show that the proposed method reduces encoding time to about 20% of the reference implementation with average loss of 0.05 dB in PSNR.
Resumo:
A built-in-self-test (BIST) subsystem embedded in a 65-nm mobile broadcast video receiver is described. The subsystem is designed to perform analog and RF measurements at multiple internal nodes of the receiver. It uses a distributed network of CMOS sensors and a low bandwidth, 12-bit A/D converter to perform the measurements with a serial bus interface enabling a digital transfer of measured data to automatic test equipment (ATE). A perturbation/correlation based BIST method is described, which makes pass/fail determination on parts, resulting in significant test time and cost reduction.
Resumo:
With the advent of Internet, video over IP is gaining popularity. In such an environment, scalability and fault tolerance will be the key issues. Existing video on demand (VoD) service systems are usually neither scalable nor tolerant to server faults and hence fail to comply to multi-user, failure-prone networks such as the Internet. Current research areas concerning VoD often focus on increasing the throughput and reliability of single server, but rarely addresses the smooth provision of service during server as well as network failures. Reliable Server Pooling (RSerPool), being capable of providing high availability by using multiple redundant servers as single source point, can be a solution to overcome the above failures. During a possible server failure, the continuity of service is retained by another server. In order to achieve transparent failover, efficient state sharing is an important requirement. In this paper, we present an elegant, simple, efficient and scalable approach which has been developed to facilitate the transfer of state by the client itself, using extended cookie mechanism, which ensures that there is no noticeable change in disruption or the video quality.
Resumo:
Rate control regulates the instantaneous video bit -rate to maximize a picture quality metric while satisfying channel constraints. Typically, a quality metric such as Peak Signalto-Noise ratio (PSNR) or weighted signal -to-noise ratio(WSNR) is chosen out of convenience. However this metric is not always truly representative of perceptual video quality.Attempts to use perceptual metrics in rate control have been limited by the accuracy of the video quality metrics chosen.Recently, new and improved metrics of subjective quality such as the Video quality experts group's (VQEG) NTIA1 General Video Quality Model (VQM) have been proven to have strong correlation with subjective quality. Here, we apply the key principles of the NTIA -VQM model to rate control in order to maximize perceptual video quality. Our experiments demonstrate that applying NTIA -VQM motivated metrics to standard TMN8 rate control in an H.263 encoder results in perceivable quality improvements over a baseline TMN8 / MSE based implementation.