978 resultados para linear predictive coding (LPC)


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Speech signals are one of the most important means of communication among the human beings. In this paper, a comparative study of two feature extraction techniques are carried out for recognizing speaker independent spoken isolated words. First one is a hybrid approach with Linear Predictive Coding (LPC) and Artificial Neural Networks (ANN) and the second method uses a combination of Wavelet Packet Decomposition (WPD) and Artificial Neural Networks. Voice signals are sampled directly from the microphone and then they are processed using these two techniques for extracting the features. Words from Malayalam, one of the four major Dravidian languages of southern India are chosen for recognition. Training, testing and pattern recognition are performed using Artificial Neural Networks. Back propagation method is used to train the ANN. The proposed method is implemented for 50 speakers uttering 20 isolated words each. Both the methods produce good recognition accuracy. But Wavelet Packet Decomposition is found to be more suitable for recognizing speech because of its multi-resolution characteristics and efficient time frequency localizations

Relevância:

100.00% 100.00%

Publicador:

Resumo:

During 1990's the Wavelet Transform emerged as an important signal processing tool with potential applications in time-frequency analysis and non-stationary signal processing.Wavelets have gained popularity in broad range of disciplines like signal/image compression, medical diagnostics, boundary value problems, geophysical signal processing, statistical signal processing,pattern recognition,underwater acoustics etc.In 1993, G. Evangelista introduced the Pitch- synchronous Wavelet Transform, which is particularly suited for pseudo-periodic signal processing.The work presented in this thesis mainly concentrates on two interrelated topics in signal processing,viz. the Wavelet Transform based signal compression and the computation of Discrete Wavelet Transform. A new compression scheme is described in which the Pitch-Synchronous Wavelet Transform technique is combined with the popular linear Predictive Coding method for pseudo-periodic signal processing. Subsequently,A novel Parallel Multiple Subsequence structure is presented for the efficient computation of Wavelet Transform. Case studies also presented to highlight the potential applications.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This thesis presents an original approach to parametric speech coding at rates below 1 kbitsjsec, primarily for speech storage applications. Essential processes considered in this research encompass efficient characterization of evolutionary configuration of vocal tract to follow phonemic features with high fidelity, representation of speech excitation using minimal parameters with minor degradation in naturalness of synthesized speech, and finally, quantization of resulting parameters at the nominated rates. For encoding speech spectral features, a new method relying on Temporal Decomposition (TD) is developed which efficiently compresses spectral information through interpolation between most steady points over time trajectories of spectral parameters using a new basis function. The compression ratio provided by the method is independent of the updating rate of the feature vectors, hence allows high resolution in tracking significant temporal variations of speech formants with no effect on the spectral data rate. Accordingly, regardless of the quantization technique employed, the method yields a high compression ratio without sacrificing speech intelligibility. Several new techniques for improving performance of the interpolation of spectral parameters through phonetically-based analysis are proposed and implemented in this research, comprising event approximated TD, near-optimal shaping event approximating functions, efficient speech parametrization for TD on the basis of an extensive investigation originally reported in this thesis, and a hierarchical error minimization algorithm for decomposition of feature parameters which significantly reduces the complexity of the interpolation process. Speech excitation in this work is characterized based on a novel Multi-Band Excitation paradigm which accurately determines the harmonic structure in the LPC (linear predictive coding) residual spectra, within individual bands, using the concept 11 of Instantaneous Frequency (IF) estimation in frequency domain. The model yields aneffective two-band approximation to excitation and computes pitch and voicing with high accuracy as well. New methods for interpolative coding of pitch and gain contours are also developed in this thesis. For pitch, relying on the correlation between phonetic evolution and pitch variations during voiced speech segments, TD is employed to interpolate the pitch contour between critical points introduced by event centroids. This compresses pitch contour in the ratio of about 1/10 with negligible error. To approximate gain contour, a set of uniformly-distributed Gaussian event-like functions is used which reduces the amount of gain information to about 1/6 with acceptable accuracy. The thesis also addresses a new quantization method applied to spectral features on the basis of statistical properties and spectral sensitivity of spectral parameters extracted from TD-based analysis. The experimental results show that good quality speech, comparable to that of conventional coders at rates over 2 kbits/sec, can be achieved at rates 650-990 bits/sec.

Relevância:

100.00% 100.00%

Publicador:

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Understanding the guiding principles of sensory coding strategies is a main goal in computational neuroscience. Among others, the principles of predictive coding and slowness appear to capture aspects of sensory processing. Predictive coding postulates that sensory systems are adapted to the structure of their input signals such that information about future inputs is encoded. Slow feature analysis (SFA) is a method for extracting slowly varying components from quickly varying input signals, thereby learning temporally invariant features. Here, we use the information bottleneck method to state an information-theoretic objective function for temporally local predictive coding. We then show that the linear case of SFA can be interpreted as a variant of predictive coding that maximizes the mutual information between the current output of the system and the input signal in the next time step. This demonstrates that the slowness principle and predictive coding are intimately related.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The algebraic formulation for linear network coding in acyclic networks with each link having an integer delay is well known. Based on this formulation, for a given set of connections over an arbitrary acyclic network with integer delay assumed for the links, the output symbols at the sink nodes at any given time instant is a Fq-linear combination of the input symbols across different generations, where Fq denotes the field over which the network operates. We use finite-field discrete Fourier transform (DFT) to convert the output symbols at the sink nodes at any given time instant into a Fq-linear combination of the input symbols generated during the same generation. We call this as transforming the acyclic network with delay into n-instantaneous networks (n is sufficiently large). We show that under certain conditions, there exists a network code satisfying sink demands in the usual (non-transform) approach if and only if there exists a network code satisfying sink demands in the transform approach. Furthermore, assuming time invariant local encoding kernels, we show that the transform method can be employed to achieve half the rate corresponding to the individual source-destination mincut (which are assumed to be equal to 1) for some classes of three-source three-destination multiple unicast network with delays using alignment strategies when the zero-interference condition is not satisfied.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Discrete polymatroids are the multi-set analogue of matroids. In this paper, we explore the connections between linear index coding and representable discrete polymatroids. The index coding problem involves a sender which generates a set of messages X = {x(1), x(2), ... x(k)} and a set of receivers R which demand messages. A receiver R is an element of R is specified by the tuple (x, H) where x. X is the message demanded by R and H subset of X \textbackslash {x} is the side information possessed by R. It is first shown that a linear solution to an index coding problem exists if and only if there exists a representable discrete polymatroid satisfying certain conditions which are determined by the index coding problem considered. El Rouayheb et. al. showed that the problem of finding a multi-linear representation for a matroid can be reduced to finding a perfect linear index coding solution for an index coding problem obtained from that matroid. Multi-linear representation of a matroid can be viewed as a special case of representation of an appropriate discrete polymatroid. We generalize the result of El Rouayheb et. al. by showing that the problem of finding a representation for a discrete polymatroid can be reduced to finding a perfect linear index coding solution for an index coding problem obtained from that discrete polymatroid.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This thesis investigated the potential use of Linear Predictive Coding in speech communication applications. A Modified Block Adaptive Predictive Coder is developed, which reduces the computational burden and complexity without sacrificing the speech quality, as compared to the conventional adaptive predictive coding (APC) system. For this, changes in the evaluation methods have been evolved. This method is as different from the usual APC system in that the difference between the true and the predicted value is not transmitted. This allows the replacement of the high order predictor in the transmitter section of a predictive coding system, by a simple delay unit, which makes the transmitter quite simple. Also, the block length used in the processing of the speech signal is adjusted relative to the pitch period of the signal being processed rather than choosing a constant length as hitherto done by other researchers. The efficiency of the newly proposed coder has been supported with results of computer simulation using real speech data. Three methods for voiced/unvoiced/silent/transition classification have been presented. The first one is based on energy, zerocrossing rate and the periodicity of the waveform. The second method uses normalised correlation coefficient as the main parameter, while the third method utilizes a pitch-dependent correlation factor. The third algorithm which gives the minimum error probability has been chosen in a later chapter to design the modified coder The thesis also presents a comparazive study beh-cm the autocorrelation and the covariance methods used in the evaluaiicn of the predictor parameters. It has been proved that the azztocorrelation method is superior to the covariance method with respect to the filter stabf-it)‘ and also in an SNR sense, though the increase in gain is only small. The Modified Block Adaptive Coder applies a switching from pitch precitzion to spectrum prediction when the speech segment changes from a voiced or transition region to an unvoiced region. The experiments cont;-:ted in coding, transmission and simulation, used speech samples from .\£=_‘ajr2_1a:r1 and English phrases. Proposal for a speaker reecgnifion syste: and a phoneme identification system has also been outlized towards the end of the thesis.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Forensic speaker comparison exams have complex characteristics, demanding a long time for manual analysis. A method for automatic recognition of vowels, providing feature extraction for acoustic analysis is proposed, aiming to contribute as a support tool in these exams. The proposal is based in formant measurements by LPC (Linear Predictive Coding), selectively by fundamental frequency detection, zero crossing rate, bandwidth and continuity, with the clustering being done by the k-means method. Experiments using samples from three different databases have shown promising results, in which the regions corresponding to five of the Brasilian Portuguese vowels were successfully located, providing visualization of a speaker’s vocal tract behavior, as well as the detection of segments corresponding to target vowels.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper considers the high-rate performance of source coding for noisy discrete symmetric channels with random index assignment (IA). Accurate analytical models are developed to characterize the expected distortion performance of vector quantization (VQ) for a large class of distortion measures. It is shown that when the point density is continuous, the distortion can be approximated as the sum of the source quantization distortion and the channel-error induced distortion. Expressions are also derived for the continuous point density that minimizes the expected distortion. Next, for the case of mean squared error distortion, a more accurate analytical model for the distortion is derived by allowing the point density to have a singular component. The extent of the singularity is also characterized. These results provide analytical models for the expected distortion performance of both conventional VQ as well as for channel-optimized VQ. As a practical example, compression of the linear predictive coding parameters in the wideband speech spectrum is considered, with the log spectral distortion as performance metric. The theory is able to correctly predict the channel error rate that is permissible for operation at a particular level of distortion.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Objective identification and description of mimicked calls is a primary component of any study on avian vocal mimicry but few studies have adopted a quantitative approach. We used spectral feature representations commonly used in human speech analysis in combination with various distance metrics to distinguish between mimicked and non-mimicked calls of the greater racket-tailed drongo, Dicrurus paradiseus and cross-validated the results with human assessment of spectral similarity. We found that the automated method and human subjects performed similarly in terms of the overall number of correct matches of mimicked calls to putative model calls. However, the two methods also misclassified different subsets of calls and we achieved a maximum accuracy of ninety five per cent only when we combined the results of both the methods. This study is the first to use Mel-frequency Cepstral Coefficients and Relative Spectral Amplitude - filtered Linear Predictive Coding coefficients to quantify vocal mimicry. Our findings also suggest that in spite of several advances in automated methods of song analysis, corresponding cross-validation by humans remains essential.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper discusses the implementation details of a child friendly, good quality, English text-to-speech (TTS) system that is phoneme-based, concatenative, easy to set up and use with little memory. Direct waveform concatenation and linear prediction coding (LPC) are used. Most existing TTS systems are unit-selection based, which use standard speech databases available in neutral adult voices.Here reduced memory is achieved by the concatenation of phonemes and by replacing phonetic wave files with their LPC coefficients. Linguistic analysis was used to reduce the algorithmic complexity instead of signal processing techniques. Sufficient degree of customization and generalization catering to the needs of the child user had been included through the provision for vocabulary and voice selection to suit the requisites of the child. Prosody had also been incorporated. This inexpensive TTS systemwas implemented inMATLAB, with the synthesis presented by means of a graphical user interface (GUI), thus making it child friendly. This can be used not only as an interesting language learning aid for the normal child but it also serves as a speech aid to the vocally disabled child. The quality of the synthesized speech was evaluated using the mean opinion score (MOS).

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper proposes a new compression algorithm for dynamic 3d meshes. In such a sequence of meshes, neighboring vertices have a strong tendency to behave similarly and the degree of dependencies between their locations in two successive frames is very large which can be efficiently exploited using a combination of Predictive and DCT coders (PDCT). Our strategy gathers mesh vertices of similar motions into clusters, establish a local coordinate frame (LCF) for each cluster and encodes frame by frame and each cluster separately. The vertices of each cluster have small variation over a time relative to the LCF. Therefore, the location of each new vertex is well predicted from its location in the previous frame relative to the LCF of its cluster. The difference between the original and the predicted local coordinates are then transformed into frequency domain using DCT. The resulting DCT coefficients are quantized and compressed with entropy coding. The original sequence of meshes can be reconstructed from only a few non-zero DCT coefficients without significant loss in visual quality. Experimental results show that our strategy outperforms or comes close to other coders.