925 resultados para Compressed speech


Relevância:

70.00% 70.00%

Publicador:

Resumo:

This thesis investigates aspects of encoding the speech spectrum at low bit rates, with extensions to the effect of such coding on automatic speaker identification. Vector quantization (VQ) is a technique for jointly quantizing a block of samples at once, in order to reduce the bit rate of a coding system. The major drawback in using VQ is the complexity of the encoder. Recent research has indicated the potential applicability of the VQ method to speech when product code vector quantization (PCVQ) techniques are utilized. The focus of this research is the efficient representation, calculation and utilization of the speech model as stored in the PCVQ codebook. In this thesis, several VQ approaches are evaluated, and the efficacy of two training algorithms is compared experimentally. It is then shown that these productcode vector quantization algorithms may be augmented with lossless compression algorithms, thus yielding an improved overall compression rate. An approach using a statistical model for the vector codebook indices for subsequent lossless compression is introduced. This coupling of lossy compression and lossless compression enables further compression gain. It is demonstrated that this approach is able to reduce the bit rate requirement from the current 24 bits per 20 millisecond frame to below 20, using a standard spectral distortion metric for comparison. Several fast-search VQ methods for use in speech spectrum coding have been evaluated. The usefulness of fast-search algorithms is highly dependent upon the source characteristics and, although previous research has been undertaken for coding of images using VQ codebooks trained with the source samples directly, the product-code structured codebooks for speech spectrum quantization place new constraints on the search methodology. The second major focus of the research is an investigation of the effect of lowrate spectral compression methods on the task of automatic speaker identification. The motivation for this aspect of the research arose from a need to simultaneously preserve the speech quality and intelligibility and to provide for machine-based automatic speaker recognition using the compressed speech. This is important because there are several emerging applications of speaker identification where compressed speech is involved. Examples include mobile communications where the speech has been highly compressed, or where a database of speech material has been assembled and stored in compressed form. Although these two application areas have the same objective - that of maximizing the identification rate - the starting points are quite different. On the one hand, the speech material used for training the identification algorithm may or may not be available in compressed form. On the other hand, the new test material on which identification is to be based may only be available in compressed form. Using the spectral parameters which have been stored in compressed form, two main classes of speaker identification algorithm are examined. Some studies have been conducted in the past on bandwidth-limited speaker identification, but the use of short-term spectral compression deserves separate investigation. Combining the major aspects of the research, some important design guidelines for the construction of an identification model when based on the use of compressed speech are put forward.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Thesis (M.S.)--University of Illinois at Urbana-Champaign, 1977.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The lack of standardized tests of central auditory processing disorder (CAPD) in South Africa (SA) led to the formation of a SA CAPD Taskforce, and the interim development of a "low linguistically loaded" CAPD test protocol using test recordings from the 'Tonal and Speech Materials for Auditory Perceptual Assessment Disc 2.0'. This study inferentially compared the performance of 16 SA English first, and 16 SA English second, language adult speakers on this test protocol, and descriptively compared their performances to previously published American normative data. Comparisons between the SA English first and second language speakers showed a poorer right ear performance (p < .05) by the second language speakers on the two-pair dichotic digits test only. Equivalent performances (p < .05) were observed on the left ear performance on the two pair dichotic digits test, and the frequency patterns test, the duration patterns test, the low-pass filtered speech test, the 45% time compressed speech test, the speech masking level difference test, and the consonant vowel consonant (CVC) binaural fusion test. Comparisons between the SA English and the American normative data showed many large differences (up to 37.1% with respect to predicted pass criteria as calculated by mean-2SD cutoffs), with the SA English speakers performing both better and worse depending on the test involved. As a result, the American normative data was not considered appropriate for immediate use as normative data in SA. Instead, the preliminary data provided in this study was recommended as interim normative data for both SA English first and second language adult speakers, until larger scale SA normative data can be obtained.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The paper describes a modular, unit selection based TTS framework, which can be used as a research bed for developing TTS in any new language, as well as studying the effect of changing any parameter during synthesis. Using this framework, TTS has been developed for Tamil. Synthesis database consists of 1027 phonetically rich prerecorded sentences. This framework has already been tested for Kannada. Our TTS synthesizes intelligible and acceptably natural speech, as supported by high mean opinion scores. The framework is further optimized to suit embedded applications like mobiles and PDAs. We compressed the synthesis speech database with standard speech compression algorithms used in commercial GSM phones and evaluated the quality of the resultant synthesized sentences. Even with a highly compressed database, the synthesized output is perceptually close to that with uncompressed database. Through experiments, we explored the ambiguities in human perception when listening to Tamil phones and syllables uttered in isolation,thus proposing to exploit the misperception to substitute for missing phone contexts in the database. Listening experiments have been conducted on sentences synthesized by deliberately replacing phones with their confused ones.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper, we have proposed a simple and effective approach to classify H.264 compressed videos, by capturing orientation information from the motion vectors. Our major contribution involves computing Histogram of Oriented Motion Vectors (HOMV) for overlapping hierarchical Space-Time cubes. The Space-Time cubes selected are partially overlapped. HOMV is found to be very effective to define the motion characteristics of these cubes. We then use Bag of Features (B OF) approach to define the video as histogram of HOMV keywords, obtained using k-means clustering. The video feature, thus computed, is found to be very effective in classifying videos. We demonstrate our results with experiments on two large publicly available video database.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Numerous algorithms have been proposed recently for sparse signal recovery in Compressed Sensing (CS). In practice, the number of measurements can be very limited due to the nature of the problem and/or the underlying statistical distribution of the non-zero elements of the sparse signal may not be known a priori. It has been observed that the performance of any sparse signal recovery algorithm depends on these factors, which makes the selection of a suitable sparse recovery algorithm difficult. To take advantage in such situations, we propose to use a fusion framework using which we employ multiple sparse signal recovery algorithms and fuse their estimates to get a better estimate. Theoretical results justifying the performance improvement are shown. The efficacy of the proposed scheme is demonstrated by Monte Carlo simulations using synthetic sparse signals and ECG signals selected from MIT-BIH database.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Many problems in digital communications involve wideband radio signals. As the most recent example, the impressive advances in Cognitive Radio systems make even more necessary the development of sampling schemes for wideband radio signals with spectral holes. This is equivalent to considering a sparse multiband signal in the framework of Compressive Sampling theory. Starting from previous results on multicoset sampling and recent advances in compressive sampling, we analyze the matrix involved in the corresponding reconstruction equation and define a new method for the design of universal multicoset codes, that is, codes guaranteeing perfect reconstruction of the sparse multiband signal.

Relevância:

20.00% 20.00%

Publicador: