13 resultados para Perceptual dialectology

em Indian Institute of Science - Bangalore - Índia


Relevância:

20.00% 20.00%

Publicador:

Resumo:

We address the problem of speech enhancement using a risk- estimation approach. In particular, we propose the use the Stein’s unbiased risk estimator (SURE) for solving the problem. The need for a suitable finite-sample risk estimator arises because the actual risks invariably depend on the unknown ground truth. We consider the popular mean-squared error (MSE) criterion first, and then compare it against the perceptually-motivated Itakura-Saito (IS) distortion, by deriving unbiased estimators of the corresponding risks. We use a generalized SURE (GSURE) development, recently proposed by Eldar for MSE. We consider dependent observation models from the exponential family with an additive noise model,and derive an unbiased estimator for the risk corresponding to the IS distortion, which is non-quadratic. This serves to address the speech enhancement problem in a more general setting. Experimental results illustrate that the IS metric is efficient in suppressing musical noise, which affects the MSE-enhanced speech. However, in terms of global signal-to-noise ratio (SNR), the minimum MSE solution gives better results.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper suggests a scheme for classifying online handwritten characters, based on dynamic space warping of strokes within the characters. A method for segmenting components into strokes using velocity profiles is proposed. Each stroke is a simple arbitrary shape and is encoded using three attributes. Correspondence between various strokes is established using Dynamic Space Warping. A distance measure which reliably differentiates between two corresponding simple shapes (strokes) has been formulated thus obtaining a perceptual distance measure between any two characters. Tests indicate an accuracy of over 85% on two different datasets of characters.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Tambura is an essential drone accompaniment used in Indian music concerts. It acts as an immediate reference of pitch for both the artists and listeners. The four strings of Tambura are tuned to the frequency ratio :1:1: . Careful listening to Tambura sound reveals that the tonal spectrum is not stationary but is time varying. The object of this study is to make a detailed spectrum analysis to find out the nature of temporal variation of the tonal spectrum of Tambura sound. Results of the analysis are correlated with perceptual evaluation conducted in a controlled acoustic environment. A significant result of this study is to demonstrate the presence of several notes which are normally not noticed even by a professional artist. The effect of bridge in Tambura in producing the so called “live tone” is explained through time and frequency parameters of Tambura sounds.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

We extend the recently proposed spectral integration based psychoacoustic model for sinusoidal distortions to the MDCT domain. The estimated masking threshold additionally depends on the sub-band spectral flatness measure of the signal which accounts for the non- sinusoidal distortion introduced by masking. The expressions for masking threshold are derived and the validity of the proposed model is established through perceptual transparency test of audio clips. Test results indicate that we do achieve transparent quality reconstruction with the new model. Performance of the model is compared with MPEG psychoacoustic models with respect to the estimated perceptual entropy (PE). The results show that the proposed model predicts a lower PE than other models.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Communication applications are usually delay restricted, especially for the instance of musicians playing over the Internet. This requires a one-way delay of maximum 25 msec and also a high audio quality is desired at feasible bit rates. The ultra low delay (ULD) audio coding structure is well suited to this application and we investigate further the application of multistage vector quantization (MSVQ) to reach a bit rate range below 64 Kb/s, in a scalable manner. Results at 32 Kb/s and 64 Kb/s show that the trained codebook MSVQ performs best, better than KLT normalization followed by a simulated Gaussian MSVQ or simulated Gaussian MSVQ alone. The results also show that there is only a weak dependence on the training data, and that we indeed converge to the perceptual quality of our previous ULD coder at 64 Kb/s.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Rate control regulates the instantaneous video bit -rate to maximize a picture quality metric while satisfying channel constraints. Typically, a quality metric such as Peak Signalto-Noise ratio (PSNR) or weighted signal -to-noise ratio(WSNR) is chosen out of convenience. However this metric is not always truly representative of perceptual video quality.Attempts to use perceptual metrics in rate control have been limited by the accuracy of the video quality metrics chosen.Recently, new and improved metrics of subjective quality such as the Video quality experts group's (VQEG) NTIA1 General Video Quality Model (VQM) have been proven to have strong correlation with subjective quality. Here, we apply the key principles of the NTIA -VQM model to rate control in order to maximize perceptual video quality. Our experiments demonstrate that applying NTIA -VQM motivated metrics to standard TMN8 rate control in an H.263 encoder results in perceivable quality improvements over a baseline TMN8 / MSE based implementation.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

How the brain maintains perceptual continuity across eye movements that yield discontinuous snapshots of the world is still poorly understood. In this study, we adapted a framework from the dual-task paradigm, well suited to reveal bottlenecks in mental processing, to study how information is processed across sequential saccades. The pattern of RTs allowed us to distinguish among three forms of trans-saccadic processing (no trans-saccadic processing, trans-saccadic visual processing and trans-saccadic visual processing and saccade planning models). Using a cued double-step saccade task, we show that even though saccade execution is a processing bottleneck, limiting access to incoming visual information, partial visual and motor processing that occur prior to saccade execution is used to guide the next eye movement. These results provide insights into how the oculomotor system is designed to process information across multiple fixations that occur during natural scanning.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Conventional encryption techniques are usually applicable for text data and often unsuited for encrypting multimedia objects for two reasons. Firstly, the huge sizes associated with multimedia objects make conventional encryption computationally costly. Secondly, multimedia objects come with massive redundancies which are useful in avoiding encryption of the objects in their entirety. Hence a class of encryption techniques devoted to encrypting multimedia objects like images have been developed. These techniques make use of the fact that the data comprising multimedia objects like images could in general be seggregated into two disjoint components, namely salient and non-salient. While the former component contributes to the perceptual quality of the object, the latter only adds minor details to it. In the context of images, the salient component is often much smaller in size than the non-salient component. Encryption effort is considerably reduced if only the salient component is encrypted while leaving the other component unencrypted. A key challenge is to find means to achieve a desirable seggregation so that the unencrypted component does not reveal any information about the object itself. In this study, an image encryption approach that uses fractal structures known as space-filling curves- in order to reduce the encryption overload is presented. In addition, the approach also enables a high quality lossy compression of images.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Visual search in real life involves complex displays with a target among multiple types of distracters, but in the laboratory, it is often tested using simple displays with identical distracters. Can complex search be understood in terms of simple searches? This link may not be straightforward if complex search has emergent properties. One such property is linear separability, whereby search is hard when a target cannot be separated from its distracters using a single linear boundary. However, evidence in favor of linear separability is based on testing stimulus configurations in an external parametric space that need not be related to their true perceptual representation. We therefore set out to assess whether linear separability influences complex search at all. Our null hypothesis was that complex search performance depends only on classical factors such as target-distracter similarity and distracter homogeneity, which we measured using simple searches. Across three experiments involving a variety of artificial and natural objects, differences between linearly separable and nonseparable searches were explained using target-distracter similarity and distracter heterogeneity. Further, simple searches accurately predicted complex search regardless of linear separability (r = 0.91). Our results show that complex search is explained by simple search, refuting the widely held belief that linear separability influences visual search.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In the product conceptualization phase of design, sketches are often used for exploration of diverse behaviour patterns of the components to achieve the required functionality. This paper presents a method to animate the sketch produced using a tablet interface to aid verification of the desired behaviour. A sketch is a spatial organization of strokes whose perceptual organization helps one to visually interpret its components and their interconnections. A Gestalt based segmentation followed by interactive grouping and articulation, presented in this paper, enables one to use a mechanism simulation framework to animate the sketch in a “pick and drag” mode to visualize different configurations of the product and gain insight into the product’s behaviour.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper presents a GPU implementation of normalized cuts for road extraction problem using panchromatic satellite imagery. The roads have been extracted in three stages namely pre-processing, image segmentation and post-processing. Initially, the image is pre-processed to improve the tolerance by reducing the clutter (that mostly represents the buildings, vegetation,. and fallow regions). The road regions are then extracted using the normalized cuts algorithm. Normalized cuts algorithm is a graph-based partitioning `approach whose focus lies in extracting the global impression (perceptual grouping) of an image rather than local features. For the segmented image, post-processing is carried out using morphological operations - erosion and dilation. Finally, the road extracted image is overlaid on the original image. Here, a GPGPU (General Purpose Graphical Processing Unit) approach has been adopted to implement the same algorithm on the GPU for fast processing. A performance comparison of this proposed GPU implementation of normalized cuts algorithm with the earlier algorithm (CPU implementation) is presented. From the results, we conclude that the computational improvement in terms of time as the size of image increases for the proposed GPU implementation of normalized cuts. Also, a qualitative and quantitative assessment of the segmentation results has been projected.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Story understanding involves many perceptual and cognitive subprocesses, from perceiving individual words, to parsing sentences, to understanding the relationships among the story characters. We present an integrated computational model of reading that incorporates these and additional subprocesses, simultaneously discovering their fMRI signatures. Our model predicts the fMRI activity associated with reading arbitrary text passages, well enough to distinguish which of two story segments is being read with 74% accuracy. This approach is the first to simultaneously track diverse reading subprocesses during complex story processing and predict the detailed neural representation of diverse story features, ranging from visual word properties to the mention of different story characters and different actions they perform. We construct brain representation maps that replicate many results from a wide range of classical studies that focus each on one aspect of language processing and offer new insights on which type of information is processed by different areas involved in language processing. Additionally, this approach is promising for studying individual differences: it can be used to create single subject maps that may potentially be used to measure reading comprehension and diagnose reading disorders.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Speech enhancement in stationary noise is addressed using the ideal channel selection framework. In order to estimate the binary mask, we propose to classify each time-frequency (T-F) bin of the noisy signal as speech or noise using Discriminative Random Fields (DRF). The DRF function contains two terms - an enhancement function and a smoothing term. On each T-F bin, we propose to use an enhancement function based on likelihood ratio test for speech presence, while Ising model is used as smoothing function for spectro-temporal continuity in the estimated binary mask. The effect of the smoothing function over successive iterations is found to reduce musical noise as opposed to using only enhancement function. The binary mask is inferred from the noisy signal using Iterated Conditional Modes (ICM) algorithm. Sentences from NOIZEUS corpus are evaluated from 0 dB to 15 dB Signal to Noise Ratio (SNR) in 4 kinds of additive noise settings: additive white Gaussian noise, car noise, street noise and pink noise. The reconstructed speech using the proposed technique is evaluated in terms of average segmental SNR, Perceptual Evaluation of Speech Quality (PESQ) and Mean opinion Score (MOS).