958 resultados para audio-visual


Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents speaker normalization approaches for audio search task. Conventional state-of-the-art feature set, viz., Mel Frequency Cepstral Coefficients (MFCC) is known to contain speaker-specific and linguistic information implicitly. This might create problem for speaker-independent audio search task. In this paper, universal warping-based approach is used for vocal tract length normalization in audio search. In particular, features such as scale transform and warped linear prediction are used to compensate speaker variability in audio matching. The advantage of these features over conventional feature set is that they apply universal frequency warping for both the templates to be matched during audio search. The performance of Scale Transform Cepstral Coefficients (STCC) and Warped Linear Prediction Cepstral Coefficients (WLPCC) are about 3% higher than the state-of-the-art MFCC feature sets on TIMIT database.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We perceive objects as containing a variety of attributes: local features, relations between features, internal details, and global properties. But we know little about how they combine. Here, we report a remarkably simple additive rule that governs how these diverse object attributes combine in vision. The perceived dissimilarity between two objects was accurately explained as a sum of (a) spatially tuned local contour-matching processes modulated by part decomposition; (b) differences in internal details, such as texture; (c) differences in emergent attributes, such as symmetry; and (d) differences in global properties, such as orientation or overall configuration of parts. Our results elucidate an enduring question in object vision by showing that the whole object is not a sum of its parts but a sum of its many attributes.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper we derive the a posteriori probability for the location of bursts of noise additively superimposed on a Gaussian AR process. The theory is developed to give a sequentially based restoration algorithm suitable for real-time applications. The algorithm is particularly appropriate for digital audio restoration, where clicks and scratches may be modelled as additive bursts of noise. Experiments are carried out on both real audio data and synthetic AR processes and Significant improvements are demonstrated over existing restoration techniques. © 1995 IEEE

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Statistical model-based methods are presented for the reconstruction of autocorrelated signals in impulsive plus continuous noise environments. Signals are modelled as autoregressive and noise sources as discrete and continuous mixtures of Gaussians, allowing for robustness in highly impulsive and non-Gaussian environments. Markov Chain Monte Carlo methods are used for reconstruction of the corrupted waveforms within a Bayesian probabilistic framework and results are presented for contaminated voice and audio signals.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We present a statistical model-based approach to signal enhancement in the case of additive broadband noise. Because broadband noise is localised in neither time nor frequency, its removal is one of the most pervasive and difficult signal enhancement tasks. In order to improve perceived signal quality, we take advantage of human perception and define a best estimate of the original signal in terms of a cost function incorporating perceptual optimality criteria. We derive the resultant signal estimator and implement it in a short-time spectral attenuation framework. Audio examples, references, and further information may be found at http://www-sigproc.eng.cam.ac.uk/~pjw47.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents a novel coarse-to-fine global localization approach that is inspired by object recognition and text retrieval techniques. Harris-Laplace interest points characterized by SIFT descriptors are used as natural land-marks. These descriptors are indexed into two databases: an inverted index and a location database. The inverted index is built based on a visual vocabulary learned from the feature descriptors. In the location database, each location is directly represented by a set of scale invariant descriptors. The localization process consists of two stages: coarse localization and fine localization. Coarse localization from the inverted index is fast but not accurate enough; whereas localization from the location database using voting algorithm is relatively slow but more accurate. The combination of coarse and fine stages makes fast and reliable localization possible. In addition, if necessary, the localization result can be verified by epipolar geometry between the representative view in database and the view to be localized. Experimental results show that our approach is efficient and reliable. ©2005 IEEE.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In the life of the Law School, focus on the “visual” can operate at three different levels: learning, teaching, and examining (legal concepts). My main interest in this paper is to explore the latter level, “examining”, broadly considered so as to encompass evaluation in general. Furthermore, that interest is pinned down here to the area of constitutional rights and human rights in general, even though the conclusions reached can (and should) likely be extrapolated to other areas of the law... In effect, the first logical step regarding the relevance of the visual approach has to do with using it yourself when you study —assuming that you came to the conclusion that you are a “visual learner”. As you know, VARK theorists propose a quadripartite classification of learners. The acronym VARK stands for Visual, Aural, Read/write, and Kinesthetic sensory modalities that are used for learning information. This model was designed in the late 80s by Neil Fleming and it has received some acceptance and a lot of attention...