34 resultados para Music video
Resumo:
With the availability of a huge amount of video data on various sources, efficient video retrieval tools are increasingly in demand. Video being a multi-modal data, the perceptions of ``relevance'' between the user provided query video (in case of Query-By-Example type of video search) and retrieved video clips are subjective in nature. We present an efficient video retrieval method that takes user's feedback on the relevance of retrieved videos and iteratively reformulates the input query feature vectors (QFV) for improved video retrieval. The QFV reformulation is done by a simple, but powerful feature weight optimization method based on Simultaneous Perturbation Stochastic Approximation (SPSA) technique. A video retrieval system with video indexing, searching and relevance feedback (RF) phases is built for demonstrating the performance of the proposed method. The query and database videos are indexed using the conventional video features like color, texture, etc. However, we use the comprehensive and novel methods of feature representations, and a spatio-temporal distance measure to retrieve the top M videos that are similar to the query. In feedback phase, the user activated iterative on the previously retrieved videos is used to reformulate the QFV weights (measure of importance) that reflect the user's preference, automatically. It is our observation that a few iterations of such feedback are generally sufficient for retrieving the desired video clips. The novel application of SPSA based RF for user-oriented feature weights optimization makes the proposed method to be distinct from the existing ones. The experimental results show that the proposed RF based video retrieval exhibit good performance.
Resumo:
Scalable video coding (SVC) is an emerging standard built on the success of advanced video coding standard (H.264/AVC) by the Joint video team (JVT). Motion compensated temporal filtering (MCTF) and Closed loop hierarchical B pictures (CHBP) are two important coding methods proposed during initial stages of standardization. Either of the coding methods, MCTF/CHBP performs better depending upon noise content and characteristics of the sequence. This work identifies other characteristics of the sequences for which performance of MCTF is superior to that of CHBP and presents a method to adaptively select either of MCTF and CHBP coding methods at the GOP level. This method, referred as "Adaptive Decomposition" is shown to provide better R-D performance than of that by using MCTF or CRBP only. Further this method is extended to non-scalable coders.
Resumo:
The problem of automatic melody line identification in a MIDI file plays an important role towards taking QBH systems to the next level. We present here, a novel algorithm to identify the melody line in a polyphonic MIDI file. A note pruning and track/channel ranking method is used to identify the melody line. We use results from musicology to derive certain simple heuristics for the note pruning stage. This helps in the robustness of the algorithm, by way of discarding "spurious" notes. A ranking based on the melodic information in each track/channel enables us to choose the melody line accurately. Our algorithm makes no assumption about MIDI performer specific parameters, is simple and achieves an accuracy of 97% in identifying the melody line correctly. This algorithm is currently being used by us in a QBH system built in our lab.
Resumo:
Feature track matrix factorization based methods have been attractive solutions to the Structure-front-motion (Sfnl) problem. Group motion of the feature points is analyzed to get the 3D information. It is well known that the factorization formulations give rise to rank deficient system of equations. Even when enough constraints exist, the extracted models are sparse due the unavailability of pixel level tracks. Pixel level tracking of 3D surfaces is a difficult problem, particularly when the surface has very little texture as in a human face. Only sparsely located feature points can be tracked and tracking error arc inevitable along rotating lose texture surfaces. However, the 3D models of an object class lie in a subspace of the set of all possible 3D models. We propose a novel solution to the Structure-from-motion problem which utilizes the high-resolution 3D obtained from range scanner to compute a basis for this desired subspace. Adding subspace constraints during factorization also facilitates removal of tracking noise which causes distortions outside the subspace. We demonstrate the effectiveness of our formulation by extracting dense 3D structure of a human face and comparing it with a well known Structure-front-motion algorithm due to Brand.
Resumo:
We propose a simple speech music discriminator that uses features based on HILN(Harmonics, Individual Lines and Noise) model. We have been able to test the strength of the feature set on a standard database of 66 files and get an accuracy of around 97%. We also have tested on sung queries and polyphonic music and have got very good results. The current algorithm is being used to discriminate between sung queries and played (using an instrument like flute) queries for a Query by Humming(QBH) system currently under development in the lab.
Resumo:
Large external memory bandwidth requirement leads to increased system power dissipation and cost in video coding application. Majority of the external memory traffic in video encoder is due to reference data accesses. We describe a lossy reference frame compression technique that can be used in video coding with minimal impact on quality while significantly reducing power and bandwidth requirement. The low cost transformless compression technique uses lossy reference for motion estimation to reduce memory traffic, and lossless reference for motion compensation (MC) to avoid drift. Thus, it is compatible with all existing video standards. We calculate the quantization error bound and show that by storing quantization error separately, bandwidth overhead due to MC can be reduced significantly. The technique meets key requirements specific to the video encode application. 24-39% reduction in peak bandwidth and 23-31% reduction in total average power consumption are observed for IBBP sequences.
Resumo:
In the direction of arrival (DOA) estimation problem, we encounter both finite data and insufficient knowledge of array characterization. It is therefore important to study how subspace-based methods perform in such conditions. We analyze the finite data performance of the multiple signal classification (MUSIC) and minimum norm (min. norm) methods in the presence of sensor gain and phase errors, and derive expressions for the mean square error (MSE) in the DOA estimates. These expressions are first derived assuming an arbitrary array and then simplified for the special case of an uniform linear array with isotropic sensors. When they are further simplified for the case of finite data only and sensor errors only, they reduce to the recent results given in [9-12]. Computer simulations are used to verify the closeness between the predicted and simulated values of the MSE.
Resumo:
In this paper, we show that it is possible to reduce the complexity of Intra MB coding in H.264/AVC based on a novel chance constrained classifier. Using the pairs of simple mean-variances values, our technique is able to reduce the complexity of Intra MB coding process with a negligible loss in PSNR. We present an alternate approach to address the classification problem which is equivalent to machine learning. Implementation results show that the proposed method reduces encoding time to about 20% of the reference implementation with average loss of 0.05 dB in PSNR.
Resumo:
A built-in-self-test (BIST) subsystem embedded in a 65-nm mobile broadcast video receiver is described. The subsystem is designed to perform analog and RF measurements at multiple internal nodes of the receiver. It uses a distributed network of CMOS sensors and a low bandwidth, 12-bit A/D converter to perform the measurements with a serial bus interface enabling a digital transfer of measured data to automatic test equipment (ATE). A perturbation/correlation based BIST method is described, which makes pass/fail determination on parts, resulting in significant test time and cost reduction.
Resumo:
With the advent of Internet, video over IP is gaining popularity. In such an environment, scalability and fault tolerance will be the key issues. Existing video on demand (VoD) service systems are usually neither scalable nor tolerant to server faults and hence fail to comply to multi-user, failure-prone networks such as the Internet. Current research areas concerning VoD often focus on increasing the throughput and reliability of single server, but rarely addresses the smooth provision of service during server as well as network failures. Reliable Server Pooling (RSerPool), being capable of providing high availability by using multiple redundant servers as single source point, can be a solution to overcome the above failures. During a possible server failure, the continuity of service is retained by another server. In order to achieve transparent failover, efficient state sharing is an important requirement. In this paper, we present an elegant, simple, efficient and scalable approach which has been developed to facilitate the transfer of state by the client itself, using extended cookie mechanism, which ensures that there is no noticeable change in disruption or the video quality.
Resumo:
Rate control regulates the instantaneous video bit -rate to maximize a picture quality metric while satisfying channel constraints. Typically, a quality metric such as Peak Signalto-Noise ratio (PSNR) or weighted signal -to-noise ratio(WSNR) is chosen out of convenience. However this metric is not always truly representative of perceptual video quality.Attempts to use perceptual metrics in rate control have been limited by the accuracy of the video quality metrics chosen.Recently, new and improved metrics of subjective quality such as the Video quality experts group's (VQEG) NTIA1 General Video Quality Model (VQM) have been proven to have strong correlation with subjective quality. Here, we apply the key principles of the NTIA -VQM model to rate control in order to maximize perceptual video quality. Our experiments demonstrate that applying NTIA -VQM motivated metrics to standard TMN8 rate control in an H.263 encoder results in perceivable quality improvements over a baseline TMN8 / MSE based implementation.
Resumo:
Non-Identical Duplicate video detection is a challenging research problem. Non-Identical Duplicate video are a pair of videos that are not exactly identical but are almost similar.In this paper, we evaluate two methods - Keyframe -based and Tomography-based methods to determine the Non-Identical Duplicate videos. These two methods make use of the existing scale based shift invariant (SIFT) method to find the match between the key frames in first method, and the cross-sections through the temporal axis of the videos in second method.We provide extensive experimental results and the analysis of accuracy and efficiency of the above two methods on a data set of Non- Identical Duplicate video-pair.
Resumo:
Image and video filtering is a key image-processing task in computer vision especially in noisy environment. In most of the cases the noise source is unknown and hence possess a major difficulty in the filtering operation. In this paper we present an error-correction based learning approach for iterative filtering. A new FIR filter is designed in which the filter coefficients are updated based on Widrow-Hoff rule. Unlike the standard filter the proposed filter has the ability to remove noise without the a priori knowledge of the noise. Experimental result shows that the proposed filter efficiently removes the noise and preserves the edges in the image. We demonstrate the capability of the proposed algorithm by testing it on standard images infected by Gaussian noise and on a real time video containing inherent noise. Experimental result shows that the proposed filter is better than some of the existing standard filters
Resumo:
We analyze the AlApana of a Carnatic music piece without the prior knowledge of the singer or the rAga. AlApana is ameans to communicate to the audience, the flavor or the bhAva of the rAga through the permitted notes and its phrases. The input to our analysis is a recording of the vocal AlApana along with the accompanying instrument. The AdhAra shadja(base note) of the singer for that AlApana is estimated through a stochastic model of note frequencies. Based on the shadja, we identify the notes (swaras) used in the AlApana using a semi-continuous GMM. Using the probabilities of each note interval, we recognize swaras of the AlApana. For sampurNa rAgas, we can identify the possible rAga, based on the swaras. We have been able to achieve correct shadja identification, which is crucial to all further steps, in 88.8% of 55 AlApanas. Among them (48 AlApanas of 7 rAgas), we get 91.5% correct swara identification and 62.13% correct R (rAga) accuracy.