994 resultados para Vector quantization


Relevância:

60.00% 60.00%

Publicador:

Resumo:

This thesis investigates aspects of encoding the speech spectrum at low bit rates, with extensions to the effect of such coding on automatic speaker identification. Vector quantization (VQ) is a technique for jointly quantizing a block of samples at once, in order to reduce the bit rate of a coding system. The major drawback in using VQ is the complexity of the encoder. Recent research has indicated the potential applicability of the VQ method to speech when product code vector quantization (PCVQ) techniques are utilized. The focus of this research is the efficient representation, calculation and utilization of the speech model as stored in the PCVQ codebook. In this thesis, several VQ approaches are evaluated, and the efficacy of two training algorithms is compared experimentally. It is then shown that these productcode vector quantization algorithms may be augmented with lossless compression algorithms, thus yielding an improved overall compression rate. An approach using a statistical model for the vector codebook indices for subsequent lossless compression is introduced. This coupling of lossy compression and lossless compression enables further compression gain. It is demonstrated that this approach is able to reduce the bit rate requirement from the current 24 bits per 20 millisecond frame to below 20, using a standard spectral distortion metric for comparison. Several fast-search VQ methods for use in speech spectrum coding have been evaluated. The usefulness of fast-search algorithms is highly dependent upon the source characteristics and, although previous research has been undertaken for coding of images using VQ codebooks trained with the source samples directly, the product-code structured codebooks for speech spectrum quantization place new constraints on the search methodology. The second major focus of the research is an investigation of the effect of lowrate spectral compression methods on the task of automatic speaker identification. The motivation for this aspect of the research arose from a need to simultaneously preserve the speech quality and intelligibility and to provide for machine-based automatic speaker recognition using the compressed speech. This is important because there are several emerging applications of speaker identification where compressed speech is involved. Examples include mobile communications where the speech has been highly compressed, or where a database of speech material has been assembled and stored in compressed form. Although these two application areas have the same objective - that of maximizing the identification rate - the starting points are quite different. On the one hand, the speech material used for training the identification algorithm may or may not be available in compressed form. On the other hand, the new test material on which identification is to be based may only be available in compressed form. Using the spectral parameters which have been stored in compressed form, two main classes of speaker identification algorithm are examined. Some studies have been conducted in the past on bandwidth-limited speaker identification, but the use of short-term spectral compression deserves separate investigation. Combining the major aspects of the research, some important design guidelines for the construction of an identification model when based on the use of compressed speech are put forward.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In this paper we present pyktree, an implementation of the K-tree algorithm in the Python programming language. The K-tree algorithm provides highly balanced search trees for vector quantization that scales up to very large data sets. Pyktree is highly modular and well suited for rapid-prototyping of novel distance measures and centroid representations. It is easy to install and provides a python package for library use as well as command line tools.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Using analysis-by-synthesis (AbS) approach, we develop a soft decision based switched vector quantization (VQ) method for high quality and low complexity coding of wideband speech line spectral frequency (LSF) parameters. For each switching region, a low complexity transform domain split VQ (TrSVQ) is designed. The overall rate-distortion (R/D) performance optimality of new switched quantizer is addressed in the Gaussian mixture model (GMM) based parametric framework. In the AbS approach, the reduction of quantization complexity is achieved through the use of nearest neighbor (NN) TrSVQs and splitting the transform domain vector into higher number of subvectors. Compared to the current LSF quantization methods, the new method is shown to provide competitive or better trade-off between R/D performance and complexity.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We develop a Gaussian mixture model (GMM) based vector quantization (VQ) method for coding wideband speech line spectrum frequency (LSF) parameters at low complexity. The PDF of LSF source vector is modeled using the Gaussian mixture (GM) density with higher number of uncorrelated Gaussian mixtures and an optimum scalar quantizer (SQ) is designed for each Gaussian mixture. The reduction of quantization complexity is achieved using the relevant subset of available optimum SQs. For an input vector, the subset of quantizers is chosen using nearest neighbor criteria. The developed method is compared with the recent VQ methods and shown to provide high quality rate-distortion (R/D) performance at lower complexity. In addition, the developed method also provides the advantages of bitrate scalability and rate-independent complexity.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We revisit the problem of temporal self organization using activity diffusion based on the neural gas (NGAS) algorithm. Using a potential function formulation motivated by a spatio-temporal metric, we derive an adaptation rule for dynamic vector quantization of data. Simulations results show that our algorithm learns the input distribution and time correlation much faster compared to the static neural gas method over the same data sequence under similar training conditions.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

As a recently developed and powerful classification tool, probabilistic neural network was used to distinguish cancer patients from healthy persons according to the levels of nucleosides in human urine. Two datasets (containing 32 and 50 patterns, respectively) were investigated and the total consistency rate obtained was 100% for dataset 1 and 94% for dataset 2. To evaluate the performance of probabilistic neural network, linear discriminant analysis and learning vector quantization network, were also applied to the classification problem. The results showed that the predictive ability of the probabilistic neural network is stronger than the others in this study. Moreover, the recognition rate for dataset 2 can achieve to 100% if combining, these three methods together, which indicated the promising potential of clinical diagnosis by combining different methods. (C) 2002 Elsevier Science B.V. All rights reserved.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

小尺寸目标跟踪是视觉跟踪中的难题。该文首先指出了均值移动小尺寸目标跟踪算法中的两个主要问题:算法跟踪中断和丢失跟踪目标。然后,论文给出了相应的解决方法。对传统Parzen窗密度估计法加以改进,并用于对候选目标区域的直方图进行插值处理,较好地解决了算法跟踪中断问题。论文采用Kullback-Leibler距离作为目标模型和候选目标之间的新型相似性度量函数,并推导了其相应的权值和新位置计算公式,提高了算法的跟踪精度。多段视频序列的跟踪实验表明,该文提出的算法可以有效地跟踪小尺寸目标,能够成功跟踪只有6×12个像素的小目标,跟踪精度也有一定提高。

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In this paper we present some extensions to the k-means algorithm for vector quantization that permit its efficient use in image segmentation and pattern classification tasks. It is shown that by introducing state variables that correspond to certain statistics of the dynamic behavior of the algorithm, it is possible to find the representative centers fo the lower dimensional maniforlds that define the boundaries between classes, for clouds of multi-dimensional, mult-class data; this permits one, for example, to find class boundaries directly from sparse data (e.g., in image segmentation tasks) or to efficiently place centers for pattern classification (e.g., with local Gaussian classifiers). The same state variables can be used to define algorithms for determining adaptively the optimal number of centers for clouds of data with space-varying density. Some examples of the applicatin of these extensions are also given.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

King, R. D. and Ouali, M. (2004) Poly-transformation. In proceedings of 5th International Conference on Intelligent Data Engineering and Automated Learning (IDEAL 2004). Springer LNCS 3177 p99-107

Relevância:

60.00% 60.00%

Publicador:

Resumo:

A bit-level systolic array system for performing a binary tree vector quantization (VQ) codebook search is described. This is based on a highly regular VLSI building block circuit. The system in question exhibits a very high data rate suitable for a range of real-time applications. A technique is described which reduces the storage requirements of such a system by 50%, with a corresponding decrease in hardware complexity.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

A number of high-performance VLSI architectures for real-time image coding applications are described. In particular, attention is focused on circuits for computing the 2-D DCT (discrete cosine transform) and for 2-D vector quantization. The former circuits are based on Winograd algorithms and comprise a number of bit-level systolic arrays with a bit-serial, word-parallel input. The latter circuits exhibit a similar data organization and consist of a number of inner product array circuits. Both circuits are highly regular and allow extremely high data rates to be achieved through extensive use of parallelism.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

A bit-level systolic array system for performing a binary tree Vector Quantization codebook search is described. This consists of a linear chain of regular VLSI building blocks and exhibits data rates suitable for a wide range of real-time applications. A technique is described which reduces the computation required at each node in the binary tree to that of a single inner product operation. This method applies to all the common distortion measures (including the Euclidean distance, the Weighted Euclidean distance and the Itakura-Saito distortion measure) and significantly reduces the hardware required to implement the tree search system. © 1990 Kluwer Academic Publishers.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Artificial neural network (ANN) methods are used to predict forest characteristics. The data source is the Southeast Alaska (SEAK) Grid Inventory, a ground survey compiled by the USDA Forest Service at several thousand sites. The main objective of this article is to predict characteristics at unsurveyed locations between grid sites. A secondary objective is to evaluate the relative performance of different ANNs. Data from the grid sites are used to train six ANNs: multilayer perceptron, fuzzy ARTMAP, probabilistic, generalized regression, radial basis function, and learning vector quantization. A classification and regression tree method is used for comparison. Topographic variables are used to construct models: latitude and longitude coordinates, elevation, slope, and aspect. The models classify three forest characteristics: crown closure, species land cover, and tree size/structure. Models are constructed using n-fold cross-validation. Predictive accuracy is calculated using a method that accounts for the influence of misclassification as well as measuring correct classifications. The probabilistic and generalized regression networks are found to be the most accurate. The predictions of the ANN models are compared with a classification of the Tongass national forest in southeast Alaska based on the interpretation of satellite imagery and are found to be of similar accuracy.