8 resultados para speech information

em Aston University Research Archive


Relevância:

40.00% 40.00%

Publicador:

Resumo:

Noise-vocoded (NV) speech is often regarded as conveying phonetic information primarily through temporal-envelope cues rather than spectral cues. However, listeners may infer the formant frequencies in the vocal-tract output—a key source of phonetic detail—from across-band differences in amplitude when speech is processed through a small number of channels. The potential utility of this spectral information was assessed for NV speech created by filtering sentences into six frequency bands, and using the amplitude envelope of each band (=30 Hz) to modulate a matched noise-band carrier (N). Bands were paired, corresponding to F1 (˜N1 + N2), F2 (˜N3 + N4) and the higher formants (F3' ˜ N5 + N6), such that the frequency contour of each formant was implied by variations in relative amplitude between bands within the corresponding pair. Three-formant analogues (F0 = 150 Hz) of the NV stimuli were synthesized using frame-by-frame reconstruction of the frequency and amplitude of each formant. These analogues were less intelligible than the NV stimuli or analogues created using contours extracted from spectrograms of the original sentences, but more intelligible than when the frequency contours were replaced with constant (mean) values. Across-band comparisons of amplitude envelopes in NV speech can provide phonetically important information about the frequency contours of the underlying formants.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The need for low bit-rate speech coding is the result of growing demand on the available radio bandwidth for mobile communications both for military purposes and for the public sector. To meet this growing demand it is required that the available bandwidth be utilized in the most economic way to accommodate more services. Two low bit-rate speech coders have been built and tested in this project. The two coders combine predictive coding with delta modulation, a property which enables them to achieve simultaneously the low bit-rate and good speech quality requirements. To enhance their efficiency, the predictor coefficients and the quantizer step size are updated periodically in each coder. This enables the coders to keep up with changes in the characteristics of the speech signal with time and with changes in the dynamic range of the speech waveform. However, the two coders differ in the method of updating their predictor coefficients. One updates the coefficients once every one hundred sampling periods and extracts the coefficients from input speech samples. This is known in this project as the Forward Adaptive Coder. Since the coefficients are extracted from input speech samples, these must be transmitted to the receiver to reconstruct the transmitted speech sample, thus adding to the transmission bit rate. The other updates its coefficients every sampling period, based on information of output data. This coder is known as the Backward Adaptive Coder. Results of subjective tests showed both coders to be reasonably robust to quantization noise. Both were graded quite good, with the Forward Adaptive performing slightly better, but with a slightly higher transmission bit rate for the same speech quality, than its Backward counterpart. The coders yielded acceptable speech quality of 9.6kbps for the Forward Adaptive and 8kbps for the Backward Adaptive.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

With the extensive use of pulse modulation methods in telecommunications, much work has been done in the search for a better utilisation of the transmission channel.The present research is an extension of these investigations. A new modulation method, 'Variable Time-Scale Information Processing', (VTSIP), is proposed.The basic principles of this system have been established, and the main advantages and disadvantages investigated. With the proposed system, comparison circuits detect the instants at which the input signal voltage crosses predetermined amplitude levels.The time intervals between these occurrences are measured digitally and the results are temporarily stored, before being transmitted.After reception, an inverse process enables the original signal to be reconstituted.The advantage of this system is that the irregularities in the rate of information contained in the input signal are smoothed out before transmission, allowing the use of a smaller transmission bandwidth. A disadvantage of the system is the time delay necessarily introduced by the storage process.Another disadvantage is a type of distortion caused by the finite store capacity.A simulation of the system has been made using a standard speech signal, to make some assessment of this distortion. It is concluded that the new system should be an improvement on existing pulse transmission systems, allowing the use of a smaller transmission bandwidth, but introducing a time delay.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

DUE TO COPYRIGHT RESTRICTIONS ONLY AVAILABLE FOR CONSULTATION AT ASTON UNIVERSITY LIBRARY AND INFORMATION SERVICES WITH PRIOR ARRANGEMENT

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Over recent years, evidence has been accumulating in favour of the importance of long-term information as a variable which can affect the success of short-term recall. Lexicality, word frequency, imagery and meaning have all been shown to augment short term recall performance. Two competing theories as to the causes of this long-term memory influence are outlined and tested in this thesis. The first approach is the order-encoding account, which ascribes the effect to the usage of resources at encoding, hypothesising that word lists which require less effort to process will benefit from increased levels of order encoding, in turn enhancing recall success. The alternative view, trace redintegration theory, suggests that order is automatically encoded phonologically, and that long-term information can only influence the interpretation of the resultant memory trace. The free recall experiments reported here attempted to determine the importance of order encoding as a facilitatory framework and to determine the locus of the effects of long-term information in free recall. Experiments 1 and 2 examined the effects of word frequency and semantic categorisation over a filled delay, and experiments 3 and 4 did the same for immediate recall. Free recall was improved by both long-term factors tested. Order information was not used over a short filled delay, but was evident in immediate recall. Furthermore, it was found that both long-term factors increased the amount of order information retained. Experiment 5 induced an order encoding effect over a filled delay, leaving a picture of short-term processes which are closely associated with long-term processes, and which fit conceptions of short-term memory being part of language processes rather better than either the encoding or the retrieval-based models. Experiments 6 and 7 aimed to determine to what extent phonological processes were responsible for the pattern of results observed. Articulatory suppression affected the encoding of order information where speech rate had no direct influence, suggesting that it is ease of lexical access which is the most important factor in the influence of long-term memory on immediate recall tasks. The evidence presented in this thesis does not offer complete support for either the retrieval-based account or the order encoding account of long-term influence. Instead, the evidence sits best with models that are based upon language-processing. The path urged for future research is to find ways in which this diffuse model can be better specified, and which can take account of the versatility of the human brain.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

DUE TO COPYRIGHT RESTRICTIONS ONLY AVAILABLE FOR CONSULTATION AT ASTON UNIVERSITY LIBRARY AND INFORMATION SERVICES WITH PRIOR ARRANGEMENT

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper we present the design and analysis of an intonation model for text-to-speech (TTS) synthesis applications using a combination of Relational Tree (RT) and Fuzzy Logic (FL) technologies. The model is demonstrated using the Standard Yorùbá (SY) language. In the proposed intonation model, phonological information extracted from text is converted into an RT. RT is a sophisticated data structure that represents the peaks and valleys as well as the spatial structure of a waveform symbolically in the form of trees. An initial approximation to the RT, called Skeletal Tree (ST), is first generated algorithmically. The exact numerical values of the peaks and valleys on the ST is then computed using FL. Quantitative analysis of the result gives RMSE of 0.56 and 0.71 for peak and valley respectively. Mean Opinion Scores (MOS) of 9.5 and 6.8, on a scale of 1 - -10, was obtained for intelligibility and naturalness respectively.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

An important aspect of speech perception is the ability to group or select formants using cues in the acoustic source characteristics-for example, fundamental frequency (F0) differences between formants promote their segregation. This study explored the role of more radical differences in source characteristics. Three-formant (F1+F2+F3) synthetic speech analogues were derived from natural sentences. In Experiment 1, F1+F3 were generated by passing a harmonic glottal source (F0 = 140 Hz) through second-order resonators (H1+H3); in Experiment 2, F1+F3 were tonal (sine-wave) analogues (T1+T3). F2 could take either form (H2 or T2). In some conditions, the target formants were presented alone, either monaurally or dichotically (left ear = F1+F3; right ear = F2). In others, they were accompanied by a competitor for F2 (F1+F2C+F3; F2), which listeners must reject to optimize recognition. Competitors (H2C or T2C) were created using the time-reversed frequency and amplitude contours of F2. Dichotic presentation of F2 and F2C ensured that the impact of the competitor arose primarily through informational masking. In the absence of F2C, the effect of a source mismatch between F1+F3 and F2 was relatively modest. When F2C was present, intelligibility was lowest when F2 was tonal and F2C was harmonic, irrespective of which type matched F1+F3. This finding suggests that source type and context, rather than similarity, govern the phonetic contribution of a formant. It is proposed that wideband harmonic analogues are more effective informational maskers than narrowband tonal analogues, and so become dominant in across-frequency integration of phonetic information when placed in competition.