925 resultados para predictive coding
Resumo:
Understanding the guiding principles of sensory coding strategies is a main goal in computational neuroscience. Among others, the principles of predictive coding and slowness appear to capture aspects of sensory processing. Predictive coding postulates that sensory systems are adapted to the structure of their input signals such that information about future inputs is encoded. Slow feature analysis (SFA) is a method for extracting slowly varying components from quickly varying input signals, thereby learning temporally invariant features. Here, we use the information bottleneck method to state an information-theoretic objective function for temporally local predictive coding. We then show that the linear case of SFA can be interpreted as a variant of predictive coding that maximizes the mutual information between the current output of the system and the input signal in the next time step. This demonstrates that the slowness principle and predictive coding are intimately related.
Resumo:
During 1990's the Wavelet Transform emerged as an important signal processing tool with potential applications in time-frequency analysis and non-stationary signal processing.Wavelets have gained popularity in broad range of disciplines like signal/image compression, medical diagnostics, boundary value problems, geophysical signal processing, statistical signal processing,pattern recognition,underwater acoustics etc.In 1993, G. Evangelista introduced the Pitch- synchronous Wavelet Transform, which is particularly suited for pseudo-periodic signal processing.The work presented in this thesis mainly concentrates on two interrelated topics in signal processing,viz. the Wavelet Transform based signal compression and the computation of Discrete Wavelet Transform. A new compression scheme is described in which the Pitch-Synchronous Wavelet Transform technique is combined with the popular linear Predictive Coding method for pseudo-periodic signal processing. Subsequently,A novel Parallel Multiple Subsequence structure is presented for the efficient computation of Wavelet Transform. Case studies also presented to highlight the potential applications.
Resumo:
Speech signals are one of the most important means of communication among the human beings. In this paper, a comparative study of two feature extraction techniques are carried out for recognizing speaker independent spoken isolated words. First one is a hybrid approach with Linear Predictive Coding (LPC) and Artificial Neural Networks (ANN) and the second method uses a combination of Wavelet Packet Decomposition (WPD) and Artificial Neural Networks. Voice signals are sampled directly from the microphone and then they are processed using these two techniques for extracting the features. Words from Malayalam, one of the four major Dravidian languages of southern India are chosen for recognition. Training, testing and pattern recognition are performed using Artificial Neural Networks. Back propagation method is used to train the ANN. The proposed method is implemented for 50 speakers uttering 20 isolated words each. Both the methods produce good recognition accuracy. But Wavelet Packet Decomposition is found to be more suitable for recognizing speech because of its multi-resolution characteristics and efficient time frequency localizations
Resumo:
This paper proposes a new compression algorithm for dynamic 3d meshes. In such a sequence of meshes, neighboring vertices have a strong tendency to behave similarly and the degree of dependencies between their locations in two successive frames is very large which can be efficiently exploited using a combination of Predictive and DCT coders (PDCT). Our strategy gathers mesh vertices of similar motions into clusters, establish a local coordinate frame (LCF) for each cluster and encodes frame by frame and each cluster separately. The vertices of each cluster have small variation over a time relative to the LCF. Therefore, the location of each new vertex is well predicted from its location in the previous frame relative to the LCF of its cluster. The difference between the original and the predicted local coordinates are then transformed into frequency domain using DCT. The resulting DCT coefficients are quantized and compressed with entropy coding. The original sequence of meshes can be reconstructed from only a few non-zero DCT coefficients without significant loss in visual quality. Experimental results show that our strategy outperforms or comes close to other coders.
Resumo:
This thesis presents an original approach to parametric speech coding at rates below 1 kbitsjsec, primarily for speech storage applications. Essential processes considered in this research encompass efficient characterization of evolutionary configuration of vocal tract to follow phonemic features with high fidelity, representation of speech excitation using minimal parameters with minor degradation in naturalness of synthesized speech, and finally, quantization of resulting parameters at the nominated rates. For encoding speech spectral features, a new method relying on Temporal Decomposition (TD) is developed which efficiently compresses spectral information through interpolation between most steady points over time trajectories of spectral parameters using a new basis function. The compression ratio provided by the method is independent of the updating rate of the feature vectors, hence allows high resolution in tracking significant temporal variations of speech formants with no effect on the spectral data rate. Accordingly, regardless of the quantization technique employed, the method yields a high compression ratio without sacrificing speech intelligibility. Several new techniques for improving performance of the interpolation of spectral parameters through phonetically-based analysis are proposed and implemented in this research, comprising event approximated TD, near-optimal shaping event approximating functions, efficient speech parametrization for TD on the basis of an extensive investigation originally reported in this thesis, and a hierarchical error minimization algorithm for decomposition of feature parameters which significantly reduces the complexity of the interpolation process. Speech excitation in this work is characterized based on a novel Multi-Band Excitation paradigm which accurately determines the harmonic structure in the LPC (linear predictive coding) residual spectra, within individual bands, using the concept 11 of Instantaneous Frequency (IF) estimation in frequency domain. The model yields aneffective two-band approximation to excitation and computes pitch and voicing with high accuracy as well. New methods for interpolative coding of pitch and gain contours are also developed in this thesis. For pitch, relying on the correlation between phonetic evolution and pitch variations during voiced speech segments, TD is employed to interpolate the pitch contour between critical points introduced by event centroids. This compresses pitch contour in the ratio of about 1/10 with negligible error. To approximate gain contour, a set of uniformly-distributed Gaussian event-like functions is used which reduces the amount of gain information to about 1/6 with acceptable accuracy. The thesis also addresses a new quantization method applied to spectral features on the basis of statistical properties and spectral sensitivity of spectral parameters extracted from TD-based analysis. The experimental results show that good quality speech, comparable to that of conventional coders at rates over 2 kbits/sec, can be achieved at rates 650-990 bits/sec.
Resumo:
This thesis investigated the potential use of Linear Predictive Coding in speech communication applications. A Modified Block Adaptive Predictive Coder is developed, which reduces the computational burden and complexity without sacrificing the speech quality, as compared to the conventional adaptive predictive coding (APC) system. For this, changes in the evaluation methods have been evolved. This method is as different from the usual APC system in that the difference between the true and the predicted value is not transmitted. This allows the replacement of the high order predictor in the transmitter section of a predictive coding system, by a simple delay unit, which makes the transmitter quite simple. Also, the block length used in the processing of the speech signal is adjusted relative to the pitch period of the signal being processed rather than choosing a constant length as hitherto done by other researchers. The efficiency of the newly proposed coder has been supported with results of computer simulation using real speech data. Three methods for voiced/unvoiced/silent/transition classification have been presented. The first one is based on energy, zerocrossing rate and the periodicity of the waveform. The second method uses normalised correlation coefficient as the main parameter, while the third method utilizes a pitch-dependent correlation factor. The third algorithm which gives the minimum error probability has been chosen in a later chapter to design the modified coder The thesis also presents a comparazive study beh-cm the autocorrelation and the covariance methods used in the evaluaiicn of the predictor parameters. It has been proved that the azztocorrelation method is superior to the covariance method with respect to the filter stabf-it)‘ and also in an SNR sense, though the increase in gain is only small. The Modified Block Adaptive Coder applies a switching from pitch precitzion to spectrum prediction when the speech segment changes from a voiced or transition region to an unvoiced region. The experiments cont;-:ted in coding, transmission and simulation, used speech samples from .\£=_‘ajr2_1a:r1 and English phrases. Proposal for a speaker reecgnifion syste: and a phoneme identification system has also been outlized towards the end of the thesis.
Resumo:
To identify and categorize complex stimuli such as familiar objects or speech, the human brain integrates information that is abstracted at multiple levels from its sensory inputs. Using cross-modal priming for spoken words and sounds, this functional magnetic resonance imaging study identified 3 distinct classes of visuoauditory incongruency effects: visuoauditory incongruency effects were selective for 1) spoken words in the left superior temporal sulcus (STS), 2) environmental sounds in the left angular gyrus (AG), and 3) both words and sounds in the lateral and medial prefrontal cortices (IFS/mPFC). From a cognitive perspective, these incongruency effects suggest that prior visual information influences the neural processes underlying speech and sound recognition at multiple levels, with the STS being involved in phonological, AG in semantic, and mPFC/IFS in higher conceptual processing. In terms of neural mechanisms, effective connectivity analyses (dynamic causal modeling) suggest that these incongruency effects may emerge via greater bottom-up effects from early auditory regions to intermediate multisensory integration areas (i.e., STS and AG). This is consistent with a predictive coding perspective on hierarchical Bayesian inference in the cortex where the domain of the prediction error (phonological vs. semantic) determines its regional expression (middle temporal gyrus/STS vs. AG/intraparietal sulcus).
Resumo:
This paper considers the high-rate performance of source coding for noisy discrete symmetric channels with random index assignment (IA). Accurate analytical models are developed to characterize the expected distortion performance of vector quantization (VQ) for a large class of distortion measures. It is shown that when the point density is continuous, the distortion can be approximated as the sum of the source quantization distortion and the channel-error induced distortion. Expressions are also derived for the continuous point density that minimizes the expected distortion. Next, for the case of mean squared error distortion, a more accurate analytical model for the distortion is derived by allowing the point density to have a singular component. The extent of the singularity is also characterized. These results provide analytical models for the expected distortion performance of both conventional VQ as well as for channel-optimized VQ. As a practical example, compression of the linear predictive coding parameters in the wideband speech spectrum is considered, with the log spectral distortion as performance metric. The theory is able to correctly predict the channel error rate that is permissible for operation at a particular level of distortion.
Resumo:
Objective identification and description of mimicked calls is a primary component of any study on avian vocal mimicry but few studies have adopted a quantitative approach. We used spectral feature representations commonly used in human speech analysis in combination with various distance metrics to distinguish between mimicked and non-mimicked calls of the greater racket-tailed drongo, Dicrurus paradiseus and cross-validated the results with human assessment of spectral similarity. We found that the automated method and human subjects performed similarly in terms of the overall number of correct matches of mimicked calls to putative model calls. However, the two methods also misclassified different subsets of calls and we achieved a maximum accuracy of ninety five per cent only when we combined the results of both the methods. This study is the first to use Mel-frequency Cepstral Coefficients and Relative Spectral Amplitude - filtered Linear Predictive Coding coefficients to quantify vocal mimicry. Our findings also suggest that in spite of several advances in automated methods of song analysis, corresponding cross-validation by humans remains essential.
Resumo:
The need for low bit-rate speech coding is the result of growing demand on the available radio bandwidth for mobile communications both for military purposes and for the public sector. To meet this growing demand it is required that the available bandwidth be utilized in the most economic way to accommodate more services. Two low bit-rate speech coders have been built and tested in this project. The two coders combine predictive coding with delta modulation, a property which enables them to achieve simultaneously the low bit-rate and good speech quality requirements. To enhance their efficiency, the predictor coefficients and the quantizer step size are updated periodically in each coder. This enables the coders to keep up with changes in the characteristics of the speech signal with time and with changes in the dynamic range of the speech waveform. However, the two coders differ in the method of updating their predictor coefficients. One updates the coefficients once every one hundred sampling periods and extracts the coefficients from input speech samples. This is known in this project as the Forward Adaptive Coder. Since the coefficients are extracted from input speech samples, these must be transmitted to the receiver to reconstruct the transmitted speech sample, thus adding to the transmission bit rate. The other updates its coefficients every sampling period, based on information of output data. This coder is known as the Backward Adaptive Coder. Results of subjective tests showed both coders to be reasonably robust to quantization noise. Both were graded quite good, with the Forward Adaptive performing slightly better, but with a slightly higher transmission bit rate for the same speech quality, than its Backward counterpart. The coders yielded acceptable speech quality of 9.6kbps for the Forward Adaptive and 8kbps for the Backward Adaptive.
Resumo:
Background - When a moving stimulus and a briefly flashed static stimulus are physically aligned in space the static stimulus is perceived as lagging behind the moving stimulus. This vastly replicated phenomenon is known as the Flash-Lag Effect (FLE). For the first time we employed biological motion as the moving stimulus, which is important for two reasons. Firstly, biological motion is processed by visual as well as somatosensory brain areas, which makes it a prime candidate for elucidating the interplay between the two systems with respect to the FLE. Secondly, discussions about the mechanisms of the FLE tend to recur to evolutionary arguments, while most studies employ highly artificial stimuli with constant velocities. Methodology/Principal Finding - Since biological motion is ecologically valid it follows complex patterns with changing velocity. We therefore compared biological to symbolic motion with the same acceleration profile. Our results with 16 observers revealed a qualitatively different pattern for biological compared to symbolic motion and this pattern was predicted by the characteristics of motor resonance: The amount of anticipatory processing of perceived actions based on the induced perspective and agency modulated the FLE. Conclusions/Significance - Our study provides first evidence for an FLE with non-linear motion in general and with biological motion in particular. Our results suggest that predictive coding within the sensorimotor system alone cannot explain the FLE. Our findings are compatible with visual prediction (Nijhawan, 2008) which assumes that extrapolated motion representations within the visual system generate the FLE. These representations are modulated by sudden visual input (e.g. offset signals) or by input from other systems (e.g. sensorimotor) that can boost or attenuate overshooting representations in accordance with biased neural competition (Desimone & Duncan, 1995).