55 resultados para Speech and pioneering sports Colima
Resumo:
We present a improved language modeling technique for Lempel-Ziv-Welch (LZW) based LID scheme. The previous approach to LID using LZW algorithm prepares the language pattern table using LZW algorithm. Because of the sequential nature of the LZW algorithm, several language specific patterns of the language were missing in the pattern table. To overcome this, we build a universal pattern table, which contains all patterns of different length. For each language it's corresponding language specific pattern table is constructed by retaining the patterns of the universal table whose frequency of appearance in the training data is above the threshold.This approach reduces the classification score (Compression Ratio [LZW-CR] or the weighted discriminant score[LZW-WDS]) for non native languages and increases the LID performance considerably.
Resumo:
High-rate analysis of channel-optimized vector quantizationThis paper considers the high-rate performance of channel optimized source coding for noisy discrete symmetric channels with random index assignment. Specifically, with mean squared error (MSE) as the performance metric, an upper bound on the asymptotic (i.e., high-rate) distortion is derived by assuming a general structure on the codebook. This structure enables extension of the analysis of the channel optimized source quantizer to one with a singular point density: for channels with small errors, the point density that minimizes the upper bound is continuous, while as the error rate increases, the point density becomes singular. The extent of the singularity is also characterized. The accuracy of the expressions obtained are verified through Monte Carlo simulations.
Resumo:
Direction Of Arrival (DOA) estimation, using a sensor array, in the presence of non-Gaussian noise using Fractional Lower-Order Moments (FLOM)matrices is studied. In this paper, a new FLOM based technique using the Fractional Lower Order Infinity Norm based Covariance (FLIC) Matrix is proposed. The bounded property and the low-rank subspace structure of the FLIC matrix is derived. Performance of FLIC based DOA estimation using MUSIC, ESPRIT, is shown to be better than other FLOM based methods.
Resumo:
This paper presents a method of designing a programmable signal processor based on a bit parallel matrix vector matrix multiplier (linear transformer). The salient feature of this design is that the efficiency of the direct vector matrix multiplier is improved and VLSI design is made much simpler by trading off the more expensive arithematic operation (multiplication) for 'cheaper' manipulation (addition/subtraction) of the data.
Resumo:
The instants at which significant excitation of vocal tract take place during voicing are referred to as epochs. Epochs and strengths of excitation pulses at epochs are useful in characterizing voice source. Epoch filtering technique proposed by the authors determine epochs from speech waveform. In this paper we propose zero-phase inverse filtering to obtain strengths of excitation pulses at epochs. Zero-phase inverse filter compensates the gross spectral envelope of short-time spectrum of speech without affecting phase characteristics. Linear prediction analysis is used to realize the zero-phase inverse filter. Source characteristics that can be derived from speech using this technique are illustrated with examples.
Resumo:
Parallel sub-word recognition (PSWR) is a new model that has been proposed for language identification (LID) which does not need elaborate phonetic labeling of the speech data in a foreign language. The new approach performs a front-end tokenization in terms of sub-word units which are designed by automatic segmentation, segment clustering and segment HMM modeling. We develop PSWR based LID in a framework similar to the parallel phone recognition (PPR) approach in the literature. This includes a front-end tokenizer and a back-end language model, for each language to be identified. Considering various combinations of the statistical evaluation scores, it is found that PSWR can perform as well as PPR, even with broad acoustic sub-word tokenization, thus making it an efficient alternative to the PPR system.
Resumo:
We address the problem of phase retrieval, which is frequently encountered in optical imaging. The measured quantity is the magnitude of the Fourier spectrum of a function (in optics, the function is also referred to as an object). The goal is to recover the object based on the magnitude measurements. In doing so, the standard assumptions are that the object is compactly supported and positive. In this paper, we consider objects that admit a sparse representation in some orthonormal basis. We develop a variant of the Fienup algorithm to incorporate the condition of sparsity and to successively estimate and refine the phase starting from the magnitude measurements. We show that the proposed iterative algorithm possesses Cauchy convergence properties. As far as the modality is concerned, we work with measurements obtained using a frequency-domain optical-coherence tomography experimental setup. The experimental results on real measured data show that the proposed technique exhibits good reconstruction performance even with fewer coefficients taken into account for reconstruction. It also suppresses the autocorrelation artifacts to a significant extent since it estimates the phase accurately.
Resumo:
Edge-preserving smoothing is widely used in image processing and bilateral filtering is one way to achieve it. Bilateral filter is a nonlinear combination of domain and range filters. Implementing the classical bilateral filter is computationally intensive, owing to the nonlinearity of the range filter. In the standard form, the domain and range filters are Gaussian functions and the performance depends on the choice of the filter parameters. Recently, a constant time implementation of the bilateral filter has been proposed based on raisedcosine approximation to the Gaussian to facilitate fast implementation of the bilateral filter. We address the problem of determining the optimal parameters for raised-cosine-based constant time implementation of the bilateral filter. To determine the optimal parameters, we propose the use of Stein's unbiased risk estimator (SURE). The fast bilateral filter accelerates the search for optimal parameters by faster optimization of the SURE cost. Experimental results show that the SURE-optimal raised-cosine-based bilateral filter has nearly the same performance as the SURE-optimal standard Gaussian bilateral filter and the Oracle mean squared error (MSE)-based optimal bilateral filter.
Resumo:
The problem of human detection is challenging, more so, when faced with adverse conditions such as occlusion and background clutter. This paper addresses the problem of human detection by representing an extracted feature of an image using a sparse linear combination of chosen dictionary atoms. The detection along with the scale finding, is done by using the coefficients obtained from sparse representation. This is of particular interest as we address the problem of scale using a scale-embedded dictionary where the conventional methods detect the object by running the detection window at all scales.
Resumo:
The classical approach to A/D conversion has been uniform sampling and we get perfect reconstruction for bandlimited signals by satisfying the Nyquist Sampling Theorem. We propose a non-uniform sampling scheme based on level crossing (LC) time information. We show stable reconstruction of bandpass signals with correct scale factor and hence a unique reconstruction from only the non-uniform time information. For reconstruction from the level crossings we make use of the sparse reconstruction based optimization by constraining the bandpass signal to be sparse in its frequency content. While overdetermined system of equations is resorted to in the literature we use an undetermined approach along with sparse reconstruction formulation. We could get a reconstruction SNR > 20dB and perfect support recovery with probability close to 1, in noise-less case and with lower probability in the noisy case. Random picking of LC from different levels over the same limited signal duration and for the same length of information, is seen to be advantageous for reconstruction.
Resumo:
This paper considers the problem of weak signal detection in the presence of navigation data bits for Global Navigation Satellite System (GNSS) receivers. Typically, a set of partial coherent integration outputs are non-coherently accumulated to combat the effects of model uncertainties such as the presence of navigation data-bits and/or frequency uncertainty, resulting in a sub-optimal test statistic. In this work, the test-statistic for weak signal detection is derived in the presence of navigation data-bits from the likelihood ratio. It is highlighted that averaging the likelihood ratio based test-statistic over the prior distributions of the unknown data bits and the carrier phase uncertainty leads to the conventional Post Detection Integration (PDI) technique for detection. To improve the performance in the presence of model uncertainties, a novel cyclostationarity based sub-optimal PDI technique is proposed. The test statistic is analytically characterized, and shown to be robust to the presence of navigation data-bits, frequency, phase and noise uncertainties. Monte Carlo simulation results illustrate the validity of the theoretical results and the superior performance offered by the proposed detector in the presence of model uncertainties.
Resumo:
The notion of the 1-D analytic signal is well understood and has found many applications. At the heart of the analytic signal concept is the Hilbert transform. The problem in extending the concept of analytic signal to higher dimensions is that there is no unique multidimensional definition of the Hilbert transform. Also, the notion of analyticity is not so well under stood in higher dimensions. Of the several 2-D extensions of the Hilbert transform, the spiral-phase quadrature transform or the Riesz transform seems to be the natural extension and has attracted a lot of attention mainly due to its isotropic properties. From the Riesz transform, Larkin et al. constructed a vortex operator, which approximates the quadratures based on asymptotic stationary-phase analysis. In this paper, we show an alternative proof for the quadrature approximation property by invoking the quasi-eigenfunction property of linear, shift-invariant systems. We show that the vortex operator comes up as a natural consequence of applying this property. We also characterize the quadrature approximation error in terms of its energy as well as the peak spatial-domain error. Such results are available for 1-D signals, but their counter part for 2-D signals have not been provided. We also provide simulation results to supplement the analytical calculations.
Resumo:
We propose a novel space-time descriptor for region-based tracking which is very concise and efficient. The regions represented by covariance matrices within a temporal fragment, are used to estimate this space-time descriptor which we call the Eigenprofiles(EP). EP so obtained is used in estimating the Covariance Matrix of features over spatio-temporal fragments. The Second Order Statistics of spatio-temporal fragments form our target model which can be adapted for variations across the video. The model being concise also allows the use of multiple spatially overlapping fragments to represent the target. We demonstrate good tracking results on very challenging datasets, shot under insufficient illumination conditions.
Resumo:
Epoch is defined as the instant of significant excitation within a pitch period of voiced speech. Epoch extraction continues to attract the interest of researchers because of its significance in speech analysis. Existing high performance epoch extraction algorithms require either dynamic programming techniques or a priori information of the average pitch period. An algorithm without such requirements is proposed based on integrated linear prediction residual (ILPR) which resembles the voice source signal. Half wave rectified and negated ILPR (or Hilbert transform of ILPR) is used as the pre-processed signal. A new non-linear temporal measure named the plosion index (PI) has been proposed for detecting `transients' in speech signal. An extension of PI, called the dynamic plosion index (DPI) is applied on pre-processed signal to estimate the epochs. The proposed DPI algorithm is validated using six large databases which provide simultaneous EGG recordings. Creaky and singing voice samples are also analyzed. The algorithm has been tested for its robustness in the presence of additive white and babble noise and on simulated telephone quality speech. The performance of the DPI algorithm is found to be comparable or better than five state-of-the-art techniques for the experiments considered.
Resumo:
We address the problem of multi-instrument recognition in polyphonic music signals. Individual instruments are modeled within a stochastic framework using Student's-t Mixture Models (tMMs). We impose a mixture of these instrument models on the polyphonic signal model. No a priori knowledge is assumed about the number of instruments in the polyphony. The mixture weights are estimated in a latent variable framework from the polyphonic data using an Expectation Maximization (EM) algorithm, derived for the proposed approach. The weights are shown to indicate instrument activity. The output of the algorithm is an Instrument Activity Graph (IAG), using which, it is possible to find out the instruments that are active at a given time. An average F-ratio of 0 : 7 5 is obtained for polyphonies containing 2-5 instruments, on a experimental test set of 8 instruments: clarinet, flute, guitar, harp, mandolin, piano, trombone and violin.