932 resultados para Speech and pioneering sports Colima
Resumo:
The problem of human detection is challenging, more so, when faced with adverse conditions such as occlusion and background clutter. This paper addresses the problem of human detection by representing an extracted feature of an image using a sparse linear combination of chosen dictionary atoms. The detection along with the scale finding, is done by using the coefficients obtained from sparse representation. This is of particular interest as we address the problem of scale using a scale-embedded dictionary where the conventional methods detect the object by running the detection window at all scales.
Resumo:
The classical approach to A/D conversion has been uniform sampling and we get perfect reconstruction for bandlimited signals by satisfying the Nyquist Sampling Theorem. We propose a non-uniform sampling scheme based on level crossing (LC) time information. We show stable reconstruction of bandpass signals with correct scale factor and hence a unique reconstruction from only the non-uniform time information. For reconstruction from the level crossings we make use of the sparse reconstruction based optimization by constraining the bandpass signal to be sparse in its frequency content. While overdetermined system of equations is resorted to in the literature we use an undetermined approach along with sparse reconstruction formulation. We could get a reconstruction SNR > 20dB and perfect support recovery with probability close to 1, in noise-less case and with lower probability in the noisy case. Random picking of LC from different levels over the same limited signal duration and for the same length of information, is seen to be advantageous for reconstruction.
Resumo:
This paper considers the problem of weak signal detection in the presence of navigation data bits for Global Navigation Satellite System (GNSS) receivers. Typically, a set of partial coherent integration outputs are non-coherently accumulated to combat the effects of model uncertainties such as the presence of navigation data-bits and/or frequency uncertainty, resulting in a sub-optimal test statistic. In this work, the test-statistic for weak signal detection is derived in the presence of navigation data-bits from the likelihood ratio. It is highlighted that averaging the likelihood ratio based test-statistic over the prior distributions of the unknown data bits and the carrier phase uncertainty leads to the conventional Post Detection Integration (PDI) technique for detection. To improve the performance in the presence of model uncertainties, a novel cyclostationarity based sub-optimal PDI technique is proposed. The test statistic is analytically characterized, and shown to be robust to the presence of navigation data-bits, frequency, phase and noise uncertainties. Monte Carlo simulation results illustrate the validity of the theoretical results and the superior performance offered by the proposed detector in the presence of model uncertainties.
Resumo:
The notion of the 1-D analytic signal is well understood and has found many applications. At the heart of the analytic signal concept is the Hilbert transform. The problem in extending the concept of analytic signal to higher dimensions is that there is no unique multidimensional definition of the Hilbert transform. Also, the notion of analyticity is not so well under stood in higher dimensions. Of the several 2-D extensions of the Hilbert transform, the spiral-phase quadrature transform or the Riesz transform seems to be the natural extension and has attracted a lot of attention mainly due to its isotropic properties. From the Riesz transform, Larkin et al. constructed a vortex operator, which approximates the quadratures based on asymptotic stationary-phase analysis. In this paper, we show an alternative proof for the quadrature approximation property by invoking the quasi-eigenfunction property of linear, shift-invariant systems. We show that the vortex operator comes up as a natural consequence of applying this property. We also characterize the quadrature approximation error in terms of its energy as well as the peak spatial-domain error. Such results are available for 1-D signals, but their counter part for 2-D signals have not been provided. We also provide simulation results to supplement the analytical calculations.
Resumo:
We propose a novel space-time descriptor for region-based tracking which is very concise and efficient. The regions represented by covariance matrices within a temporal fragment, are used to estimate this space-time descriptor which we call the Eigenprofiles(EP). EP so obtained is used in estimating the Covariance Matrix of features over spatio-temporal fragments. The Second Order Statistics of spatio-temporal fragments form our target model which can be adapted for variations across the video. The model being concise also allows the use of multiple spatially overlapping fragments to represent the target. We demonstrate good tracking results on very challenging datasets, shot under insufficient illumination conditions.
Resumo:
Epoch is defined as the instant of significant excitation within a pitch period of voiced speech. Epoch extraction continues to attract the interest of researchers because of its significance in speech analysis. Existing high performance epoch extraction algorithms require either dynamic programming techniques or a priori information of the average pitch period. An algorithm without such requirements is proposed based on integrated linear prediction residual (ILPR) which resembles the voice source signal. Half wave rectified and negated ILPR (or Hilbert transform of ILPR) is used as the pre-processed signal. A new non-linear temporal measure named the plosion index (PI) has been proposed for detecting `transients' in speech signal. An extension of PI, called the dynamic plosion index (DPI) is applied on pre-processed signal to estimate the epochs. The proposed DPI algorithm is validated using six large databases which provide simultaneous EGG recordings. Creaky and singing voice samples are also analyzed. The algorithm has been tested for its robustness in the presence of additive white and babble noise and on simulated telephone quality speech. The performance of the DPI algorithm is found to be comparable or better than five state-of-the-art techniques for the experiments considered.
Resumo:
We address the problem of multi-instrument recognition in polyphonic music signals. Individual instruments are modeled within a stochastic framework using Student's-t Mixture Models (tMMs). We impose a mixture of these instrument models on the polyphonic signal model. No a priori knowledge is assumed about the number of instruments in the polyphony. The mixture weights are estimated in a latent variable framework from the polyphonic data using an Expectation Maximization (EM) algorithm, derived for the proposed approach. The weights are shown to indicate instrument activity. The output of the algorithm is an Instrument Activity Graph (IAG), using which, it is possible to find out the instruments that are active at a given time. An average F-ratio of 0 : 7 5 is obtained for polyphonies containing 2-5 instruments, on a experimental test set of 8 instruments: clarinet, flute, guitar, harp, mandolin, piano, trombone and violin.
Resumo:
We propose to employ bilateral filters to solve the problem of edge detection. The proposed methodology presents an efficient and noise robust method for detecting edges. Classical bilateral filters smooth images without distorting edges. In this paper, we modify the bilateral filter to perform edge detection, which is the opposite of bilateral smoothing. The Gaussian domain kernel of the bilateral filter is replaced with an edge detection mask, and Gaussian range kernel is replaced with an inverted Gaussian kernel. The modified range kernel serves to emphasize dissimilar regions. The resulting approach effectively adapts the detection mask according as the pixel intensity differences. The results of the proposed algorithm are compared with those of standard edge detection masks. Comparisons of the bilateral edge detector with Canny edge detection algorithm, both after non-maximal suppression, are also provided. The results of our technique are observed to be better and noise-robust than those offered by methods employing masks alone, and are also comparable to the results from Canny edge detector, outperforming it in certain cases.
Resumo:
In this paper, we have proposed a simple and effective approach to classify H.264 compressed videos, by capturing orientation information from the motion vectors. Our major contribution involves computing Histogram of Oriented Motion Vectors (HOMV) for overlapping hierarchical Space-Time cubes. The Space-Time cubes selected are partially overlapped. HOMV is found to be very effective to define the motion characteristics of these cubes. We then use Bag of Features (B OF) approach to define the video as histogram of HOMV keywords, obtained using k-means clustering. The video feature, thus computed, is found to be very effective in classifying videos. We demonstrate our results with experiments on two large publicly available video database.
Resumo:
Sparse representation based classification (SRC) is one of the most successful methods that has been developed in recent times for face recognition. Optimal projection for Sparse representation based classification (OPSRC)1] provides a dimensionality reduction map that is supposed to give optimum performance for SRC framework. However, the computational complexity involved in this method is too high. Here, we propose a new projection technique using the data scatter matrix which is computationally superior to the optimal projection method with comparable classification accuracy with respect OPSRC. The performance of the proposed approach is benchmarked with various publicly available face database.
Resumo:
Numerous algorithms have been proposed recently for sparse signal recovery in Compressed Sensing (CS). In practice, the number of measurements can be very limited due to the nature of the problem and/or the underlying statistical distribution of the non-zero elements of the sparse signal may not be known a priori. It has been observed that the performance of any sparse signal recovery algorithm depends on these factors, which makes the selection of a suitable sparse recovery algorithm difficult. To take advantage in such situations, we propose to use a fusion framework using which we employ multiple sparse signal recovery algorithms and fuse their estimates to get a better estimate. Theoretical results justifying the performance improvement are shown. The efficacy of the proposed scheme is demonstrated by Monte Carlo simulations using synthetic sparse signals and ECG signals selected from MIT-BIH database.
Resumo:
Designing a robust algorithm for visual object tracking has been a challenging task since many years. There are trackers in the literature that are reasonably accurate for many tracking scenarios but most of them are computationally expensive. This narrows down their applicability as many tracking applications demand real time response. In this paper, we present a tracker based on random ferns. Tracking is posed as a classification problem and classification is done using ferns. We used ferns as they rely on binary features and are extremely fast at both training and classification as compared to other classification algorithms. Our experiments show that the proposed tracker performs well on some of the most challenging tracking datasets and executes much faster than one of the state-of-the-art trackers, without much difference in tracking accuracy.
Resumo:
Time-varying linear prediction has been studied in the context of speech signals, in which the auto-regressive (AR) coefficients of the system function are modeled as a linear combination of a set of known bases. Traditionally, least squares minimization is used for the estimation of model parameters of the system. Motivated by the sparse nature of the excitation signal for voiced sounds, we explore the time-varying linear prediction modeling of speech signals using sparsity constraints. Parameter estimation is posed as a 0-norm minimization problem. The re-weighted 1-norm minimization technique is used to estimate the model parameters. We show that for sparsely excited time-varying systems, the formulation models the underlying system function better than the least squares error minimization approach. Evaluation with synthetic and real speech examples show that the estimated model parameters track the formant trajectories closer than the least squares approach.
Resumo:
We address the problem of parameter estimation of an ellipse from a limited number of samples. We develop a new approach for solving the ellipse fitting problem by showing that the x and y coordinate functions of an ellipse are finite-rate-of-innovation (FRI) signals. Uniform samples of x and y coordinate functions of the ellipse are modeled as a sum of weighted complex exponentials, for which we propose an efficient annihilating filter technique to estimate the ellipse parameters from the samples. The FRI framework allows for estimating the ellipse parameters reliably from partial or incomplete measurements even in the presence of noise. The efficiency and robustness of the proposed method is compared with state-of-art direct method. The experimental results show that the estimated parameters have lesser bias compared with the direct method and the estimation error is reduced by 5-10 dB relative to the direct method.
Resumo:
We consider the problem of parameter estimation from real-valued multi-tone signals. Such problems arise frequently in spectral estimation. More recently, they have gained new importance in finite-rate-of-innovation signal sampling and reconstruction. The annihilating filter is a key tool for parameter estimation in these problems. The standard annihilating filter design has to be modified to result in accurate estimation when dealing with real sinusoids, particularly because the real-valued nature of the sinusoids must be factored into the annihilating filter design. We show that the constraint on the annihilating filter can be relaxed by making use of the Hilbert transform. We refer to this approach as the Hilbert annihilating filter approach. We show that accurate parameter estimation is possible by this approach. In the single-tone case, the mean-square error performance increases by 6 dB for signal-to-noise ratio (SNR) greater than 0 dB. We also present experimental results in the multi-tone case, which show that a significant improvement (about 6 dB) is obtained when the parameters are close to 0 or pi. In the mid-frequency range, the improvement is about 2 to 3 dB.