124 resultados para 3D Video Telecommunication Multimedia
em Indian Institute of Science - Bangalore - Índia
Resumo:
Real time anomaly detection is the need of the hour for any security applications. In this article, we have proposed a real time anomaly detection for H.264 compressed video streams utilizing pre-encoded motion vectors (MVs). The proposed work is principally motivated by the observation that MVs have distinct characteristics during anomaly than usual. Our observation shows that H.264 MV magnitude and orientation contain relevant information which can be used to model the usual behavior (UB) effectively. This is subsequently extended to detect abnormality/anomaly based on the probability of occurrence of a behavior. The performance of the proposed algorithm was evaluated and bench-marked on UMN and Ped anomaly detection video datasets, with a detection rate of 70 frames per sec resulting in 90x and 250x speedup, along with on-par detection accuracy compared to the state-of-the-art algorithms.
Resumo:
A novel algorithm for Virtual View Synthesis based on Non-Local Means Filtering is presented in this paper. Apart from using the video frames from the nearby cameras and the corresponding per-pixel depth map, this algorithm also makes use of the previously synthesized frame. Simple and efficient, the algorithm can synthesize video at any given virtual viewpoint at a faster rate. In the process, the quality of the synthesized frame is not compromised. Experimental results prove the above mentioned claim. The subjective and objective quality of the synthesized frames are comparable to the existing algorithms.
Resumo:
Feature track matrix factorization based methods have been attractive solutions to the Structure-front-motion (Sfnl) problem. Group motion of the feature points is analyzed to get the 3D information. It is well known that the factorization formulations give rise to rank deficient system of equations. Even when enough constraints exist, the extracted models are sparse due the unavailability of pixel level tracks. Pixel level tracking of 3D surfaces is a difficult problem, particularly when the surface has very little texture as in a human face. Only sparsely located feature points can be tracked and tracking error arc inevitable along rotating lose texture surfaces. However, the 3D models of an object class lie in a subspace of the set of all possible 3D models. We propose a novel solution to the Structure-from-motion problem which utilizes the high-resolution 3D obtained from range scanner to compute a basis for this desired subspace. Adding subspace constraints during factorization also facilitates removal of tracking noise which causes distortions outside the subspace. We demonstrate the effectiveness of our formulation by extracting dense 3D structure of a human face and comparing it with a well known Structure-front-motion algorithm due to Brand.
Resumo:
With the availability of a huge amount of video data on various sources, efficient video retrieval tools are increasingly in demand. Video being a multi-modal data, the perceptions of ``relevance'' between the user provided query video (in case of Query-By-Example type of video search) and retrieved video clips are subjective in nature. We present an efficient video retrieval method that takes user's feedback on the relevance of retrieved videos and iteratively reformulates the input query feature vectors (QFV) for improved video retrieval. The QFV reformulation is done by a simple, but powerful feature weight optimization method based on Simultaneous Perturbation Stochastic Approximation (SPSA) technique. A video retrieval system with video indexing, searching and relevance feedback (RF) phases is built for demonstrating the performance of the proposed method. The query and database videos are indexed using the conventional video features like color, texture, etc. However, we use the comprehensive and novel methods of feature representations, and a spatio-temporal distance measure to retrieve the top M videos that are similar to the query. In feedback phase, the user activated iterative on the previously retrieved videos is used to reformulate the QFV weights (measure of importance) that reflect the user's preference, automatically. It is our observation that a few iterations of such feedback are generally sufficient for retrieving the desired video clips. The novel application of SPSA based RF for user-oriented feature weights optimization makes the proposed method to be distinct from the existing ones. The experimental results show that the proposed RF based video retrieval exhibit good performance.
Resumo:
We extend the modeling heuristic of (Harsha et al. 2006. In IEEE IWQoS 06, pp 178 - 187) to evaluate the performance of an IEEE 802.11e infrastructure network carrying packet telephone calls, streaming video sessions and TCP controlled file downloads, using Enhanced Distributed Channel Access (EDCA). We identify the time boundaries of activities on the channel (called channel slot boundaries) and derive a Markov Renewal Process of the contending nodes on these epochs. This is achieved by the use of attempt probabilities of the contending nodes as those obtained from the saturation fixed point analysis of (Ramaiyan et al. 2005. In Proceedings ACM Sigmetrics, `05. Journal version accepted for publication in IEEE TON). Regenerative analysis on this MRP yields the desired steady state performance measures. We then use the MRP model to develop an effective bandwidth approach for obtaining a bound on the size of the buffer required at the video queue of the AP, such that the streaming video packet loss probability is kept to less than 1%. The results obtained match well with simulations using the network simulator, ns-2. We find that, with the default IEEE 802.11e EDCA parameters for access categories AC 1, AC 2 and AC 3, the voice call capacity decreases if even one streaming video session and one TCP file download are initiated by some wireless station. Subsequently, reducing the voice calls increases the video downlink stream throughput by 0.38 Mbps and file download capacity by 0.14 Mbps, for every voice call (for the 11 Mbps PHY). We find that a buffer size of 75KB is sufficient to ensure that the video packet loss probability at the QAP is within 1%.
Resumo:
Scalable video coding (SVC) is an emerging standard built on the success of advanced video coding standard (H.264/AVC) by the Joint video team (JVT). Motion compensated temporal filtering (MCTF) and Closed loop hierarchical B pictures (CHBP) are two important coding methods proposed during initial stages of standardization. Either of the coding methods, MCTF/CHBP performs better depending upon noise content and characteristics of the sequence. This work identifies other characteristics of the sequences for which performance of MCTF is superior to that of CHBP and presents a method to adaptively select either of MCTF and CHBP coding methods at the GOP level. This method, referred as "Adaptive Decomposition" is shown to provide better R-D performance than of that by using MCTF or CRBP only. Further this method is extended to non-scalable coders.
Resumo:
We provide a survey of some of our recent results ([9], [13], [4], [6], [7]) on the analytical performance modeling of IEEE 802.11 wireless local area networks (WLANs). We first present extensions of the decoupling approach of Bianchi ([1]) to the saturation analysis of IEEE 802.11e networks with multiple traffic classes. We have found that even when analysing WLANs with unsaturated nodes the following state dependent service model works well: when a certain set of nodes is nonempty, their channel attempt behaviour is obtained from the corresponding fixed point analysis of the saturated system. We will present our experiences in using this approximation to model multimedia traffic over an IEEE 802.11e network using the enhanced DCF channel access (EDCA) mechanism. We have found that we can model TCP controlled file transfers, VoIP packet telephony, and streaming video in the IEEE802.11e setting by this simple approximation.
Resumo:
CD-ROMs have proliferated as a distribution media for desktop machines for a large variety of multimedia applications (targeted for a single-user environment) like encyclopedias, magazines and games. With CD-ROM capacities up to 3 GB being available in the near future, they will form an integral part of Video on Demand (VoD) servers to store full-length movies and multimedia. In the first section of this paper we look at issues related to the single- user desktop environment. Since these multimedia applications are highly interactive in nature, we take a pragmatic approach, and have made a detailed study of the multimedia application behavior in terms of the I/O request patterns generated to the CD-ROM subsystem by tracing these patterns. We discuss prefetch buffer design and seek time characteristics in the context of the analysis of these traces. We also propose an adaptive main-memory hosted cache that receives caching hints from the application to reduce the latency when the user moves from one node of the hyper graph to another. In the second section we look at the use of CD-ROM in a VoD server and discuss the problem of scheduling multiple request streams and buffer management in this scenario. We adapt the C-SCAN (Circular SCAN) algorithm to suit the CD-ROM drive characteristics and prove that it is optimal in terms of buffer size management. We provide computationally inexpensive relations by which this algorithm can be implemented. We then propose an admission control algorithm which admits new request streams without disrupting the continuity of playback of the previous request streams. The algorithm also supports operations such as fast forward and replay. Finally, we discuss the problem of optimal placement of MPEG streams on CD-ROMs in the third section.
Resumo:
This paper addresses the problem of how to select the optimal number of sensors and how to determine their placement in a given monitored area for multimedia surveillance systems. We propose to solve this problem by obtaining a novel performance metric in terms of a probability measure for accomplishing the task as a function of set of sensors and their placement. This measure is then used to find the optimal set. The same measure can be used to analyze the degradation in system 's performance with respect to the failure of various sensors. We also build a surveillance system using the optimal set of sensors obtained based on the proposed design methodology. Experimental results show the effectiveness of the proposed design methodology in selecting the optimal set of sensors and their placement.
Resumo:
Prediction of variable bit rate compressed video traffic is critical to dynamic allocation of resources in a network. In this paper, we propose a technique for preprocessing the dataset used for training a video traffic predictor. The technique involves identifying the noisy instances in the data using a fuzzy inference system. We focus on three prediction techniques, namely, linear regression, neural network and support vector regression and analyze their performance on H.264 video traces. Our experimental results reveal that data preprocessing greatly improves the performance of linear regression and neural network, but is not effective on support vector regression.
Resumo:
The key requirements for enabling real-time remote healthcare service on a mobile platform, in the present day heterogeneous wireless access network environment, are uninterrupted and continuous access to the online patient vital medical data, monitor the physical condition of the patient through video streaming, and so on. For an application, this continuity has to be sufficiently transparent both from a performance perspective as well as a Quality of Experience (QoE) perspective. While mobility protocols (MIPv6, HIP, SCTP, DSMIP, PMIP, and SIP) strive to provide both and do so, limited or non-availability (deployment) of these protocols on provider networks and server side infrastructure has impeded adoption of mobility on end user platforms. Add to this, the cumbersome OS configuration procedures required to enable mobility protocol support on end user devices and the user's enthusiasm to add this support is lost. Considering the lack of proper mobility implementations that meet the remote healthcare requirements above, we propose SeaMo+ that comprises a light-weight application layer framework, termed as the Virtual Real-time Multimedia Service (VRMS) for mobile devices to provide an uninterrupted real-time multimedia information access to the mobile user. VRMS is easy to configure, platform independent, and does not require additional network infrastructure unlike other existing schemes. We illustrate the working of SeaMo+ in two realistic remote patient monitoring application scenarios.
Resumo:
In this paper optical code-division multiple-access (O-CDMA) packet network is considered, which offers inherent security in the access networks. The application of O-CDMA to multimedia transmission (voice, data, and video) is investigated. The simultaneous transmission of various services is achieved by assigning to each user unique multiple code signatures. Thus, by applying a parallel mapping technique, we achieve multi-rate services. A random access protocol is proposed, here, where all distinct codes are used, for packet transmission. The codes, Optical Orthogonal Code (OOC), or 1D codes and Wavelength/Time Single-Pulse-per-Row (W/T SPR), or 2D codes, are analyzed. These 1D and 2D codes with varied weight are used to differentiate the Quality of Service (QoS). The theoretical bit error probability corresponding to the quality of each service is established using 1D and 2D codes in the receiver noiseless case and compared. The results show that, using 2D codes QoS in multimedia transmission is better than using 1D codes.
Resumo:
Advertising is ubiquitous in the online community and more so in the ever-growing and popular online video delivery websites (e. g., YouTube). Video advertising is becoming increasingly popular on these websites. In addition to the existing pre-roll/post-roll advertising and contextual advertising, this paper proposes an in-stream video advertising strategy-Computational Affective Video-in-Video Advertising (CAVVA). Humans being emotional creatures are driven by emotions as well as rational thought. We believe that emotions play a major role in influencing the buying behavior of users and hence propose a video advertising strategy which takes into account the emotional impact of the videos as well as advertisements. Given a video and a set of advertisements, we identify candidate advertisement insertion points (step 1) and also identify the suitable advertisements (step 2) according to theories from marketing and consumer psychology. We formulate this two part problem as a single optimization function in a non-linear 0-1 integer programming framework and provide a genetic algorithm based solution. We evaluate CAVVA using a subjective user-study and eye-tracking experiment. Through these experiments, we demonstrate that CAVVA achieves a good balance between the following seemingly conflicting goals of (a) minimizing the user disturbance because of advertisement insertion while (b) enhancing the user engagement with the advertising content. We compare our method with existing advertising strategies and show that CAVVA can enhance the user's experience and also help increase the monetization potential of the advertising content.
Resumo:
This paper discusses a novel high-speed approach for human action recognition in H.264/AVC compressed domain. The proposed algorithm utilizes cues from quantization parameters and motion vectors extracted from the compressed video sequence for feature extraction and further classification using Support Vector Machines (SVM). The ultimate goal of the proposed work is to portray a much faster algorithm than pixel domain counterparts, with comparable accuracy, utilizing only the sparse information from compressed video. Partial decoding rules out the complexity of full decoding, and minimizes computational load and memory usage, which can result in reduced hardware utilization and faster recognition results. The proposed approach can handle illumination changes, scale, and appearance variations, and is robust to outdoor as well as indoor testing scenarios. We have evaluated the performance of the proposed method on two benchmark action datasets and achieved more than 85 % accuracy. The proposed algorithm classifies actions with speed (> 2,000 fps) approximately 100 times faster than existing state-of-the-art pixel-domain algorithms.