996 resultados para speech delay


Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper we propose a new method for utilising phase information by complementing it with traditional magnitude-only spectral subtraction speech enhancement through Complex Spectrum Subtraction (CSS). The proposed approach has the following advantages over traditional magnitude-only spectral subtraction: (a) it introduces complementary information to the enhancement algorithm; (b) it reduces the total number of algorithmic parameters, and; (c) is designed for improving clean speech magnitude spectra and is therefore suitable for both automatic speech recognition (ASR) and speech perception applications. Oracle-based ASR experiments verify this approach, showing an average of 20% relative word accuracy improvements when accurate estimates of the phase spectrum are available. Based on sinusoidal analysis and assuming stationarity between observations (which is shown to be better approximated as the frame rate is increased), this paper also proposes a novel method for acquiring the phase information called Phase Estimation via Delay Projection (PEDEP). Further oracle ASR experiments validate the potential for the proposed PEDEP technique in ideal conditions. Realistic implementation of CSS with PEDEP shows performance comparable to state of the art spectral subtraction techniques in a range of 15-20 dB signal-to-noise ratio environments. These results clearly demonstrate the potential for using phase spectra in spectral subtractive enhancement applications, and at the same time highlight the need for deriving more accurate phase estimates in a wider range of noise conditions.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Automatic speech recognition from multiple distant micro- phones poses significant challenges because of noise and reverberations. The quality of speech acquisition may vary between microphones because of movements of speakers and channel distortions. This paper proposes a channel selection approach for selecting reliable channels based on selection criterion operating in the short-term modulation spectrum domain. The proposed approach quantifies the relative strength of speech from each microphone and speech obtained from beamforming modulations. The new technique is compared experimentally in the real reverb conditions in terms of perceptual evaluation of speech quality (PESQ) measures and word error rate (WER). Overall improvement in recognition rate is observed using delay-sum and superdirective beamformers compared to the case when the channel is selected randomly using circular microphone arrays.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We present a novel approach to represent transients using spectral-domain amplitude-modulated/frequency -modulated (AM-FM) functions. The model is applied to the real and imaginary parts of the Fourier transform (FT) of the transient. The suitability of the model lies in the observation that since transients are well-localized in time, the real and imaginary parts of the Fourier spectrum have a modulation structure. The spectral AM is the envelope and the spectral FM is the group delay function. The group delay is estimated using spectral zero-crossings and the spectral envelope is estimated using a coherent demodulator. We show that the proposed technique is robust to additive noise. We present applications of the proposed technique to castanets and stop-consonants in speech.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We analyze the spectral zero-crossing rate (SZCR) properties of transient signals and show that SZCR contains accurate localization information about the transient. For a train of pulses containing transient events, the SZCR computed on a sliding window basis is useful in locating the impulse locations accurately. We present the properties of SZCR on standard stylized signal models and then show how it may be used to estimate the epochs in speech signals. We also present comparisons with some state-of-the-art techniques that are based on the group-delay function. Experiments on real speech show that the proposed SZCR technique is better than other group-delay-based epoch detectors. In the presence of noise, a comparison with the zero-frequency filtering technique (ZFF) and Dynamic programming projected Phase-Slope Algorithm (DYPSA) showed that performance of the SZCR technique is better than DYPSA and inferior to that of ZFF. For highpass-filtered speech, where ZFF performance suffers drastically, the identification rates of SZCR are better than those of DYPSA.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Transient signals such as plosives in speech or Castanets in audio do not have a specific modulation or periodic structure in time domain. However, in the spectral domain they exhibit a prominent modulation structure, which is a direct consequence of their narrow time localization. Based on this observation, a spectral-domain AM-FM model for transients is proposed. The spectral AM-FM model is built starting from real spectral zero-crossings. The AM and FM correspond to the spectral envelope (SE) and group delay (GD), respectively. Taking into account the modulation structure and spectral continuity, a local polynomial regression technique is proposed to estimate the GD function from the real spectral zeros. The SE is estimated based on the phase function computed from the estimated GD. Since the GD estimation is parametric, the degree of smoothness can be controlled directly. Simulation results based on synthetic transient signals generated using a beta density function are presented to analyze the noise-robustness of the SEGD model. Three specific applications are considered: (1) SEGD based modeling of Castanet sounds; (2) appropriateness of the model for transient compression; and (3) determining glottal closure instants in speech using a short-time SEGD model of the linear prediction residue.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Feature-based vocoders, e.g., STRAIGHT, offer a way to manipulate the perceived characteristics of the speech signal in speech transformation and synthesis. For the harmonic model, which provide excellent perceived quality, features for the amplitude parameters already exist (e.g., Line Spectral Frequencies (LSF), Mel-Frequency Cepstral Coefficients (MFCC)). However, because of the wrapping of the phase parameters, phase features are more difficult to design. To randomize the phase of the harmonic model during synthesis, a voicing feature is commonly used, which distinguishes voiced and unvoiced segments. However, voice production allows smooth transitions between voiced/unvoiced states which makes voicing segmentation sometimes tricky to estimate. In this article, two-phase features are suggested to represent the phase of the harmonic model in a uniform way, without voicing decision. The synthesis quality of the resulting vocoder has been evaluated, using subjective listening tests, in the context of resynthesis, pitch scaling, and Hidden Markov Model (HMM)-based synthesis. The experiments show that the suggested signal model is comparable to STRAIGHT or even better in some scenarios. They also reveal some limitations of the harmonic framework itself in the case of high fundamental frequencies.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The current study investigated the effects that barriers (both real and perceived) had on participation and completion of speech and language programs for preschool children with communication delays. I compared 36 families of preschool children with an identified communication delay that have completed services (completers) to 13 families that have not completed services (non-completers) prescribed by Speech and Language professionals. Data findings reported were drawn from an interview with the mother, a speech and language assessment of the child, and an extensive package of measures completed by the mother. Children ranged in age from 32 to 71 mos. These data were collected as part of a project funded by the Canadian Language and Literacy Research Networks of Centres of Excellence. Findings suggest that completers and non-completers shared commonalities in a number of parenting characteristics but differed significantly in two areas. Mothers in the noncompleting group were more permissive and had lower maternal education than mothers in the completing families. From a systemic standpoint, families also differed in the number of perceived barriers to treatment experienced during their time with Speech Services Niagara. Mothers in the non-completing group experienced more perceived barriers to treatment than completing mothers. Specifically, these mothers perceived more stressors and obstacles that competed with treatment, perceived more treatment demands and they perceived the relevance of treatment as less important than the completing group. Despite this, the findings suggest that non-completing families were 100% satisfied with services. Contrary to predictions, there were no significant differences in child characterisfics and economic characteristics between completers and non-completers. The findings in this study are considered exploratory and tentative due to the small sample size.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This thesis investigated the potential use of Linear Predictive Coding in speech communication applications. A Modified Block Adaptive Predictive Coder is developed, which reduces the computational burden and complexity without sacrificing the speech quality, as compared to the conventional adaptive predictive coding (APC) system. For this, changes in the evaluation methods have been evolved. This method is as different from the usual APC system in that the difference between the true and the predicted value is not transmitted. This allows the replacement of the high order predictor in the transmitter section of a predictive coding system, by a simple delay unit, which makes the transmitter quite simple. Also, the block length used in the processing of the speech signal is adjusted relative to the pitch period of the signal being processed rather than choosing a constant length as hitherto done by other researchers. The efficiency of the newly proposed coder has been supported with results of computer simulation using real speech data. Three methods for voiced/unvoiced/silent/transition classification have been presented. The first one is based on energy, zerocrossing rate and the periodicity of the waveform. The second method uses normalised correlation coefficient as the main parameter, while the third method utilizes a pitch-dependent correlation factor. The third algorithm which gives the minimum error probability has been chosen in a later chapter to design the modified coder The thesis also presents a comparazive study beh-cm the autocorrelation and the covariance methods used in the evaluaiicn of the predictor parameters. It has been proved that the azztocorrelation method is superior to the covariance method with respect to the filter stabf-it)‘ and also in an SNR sense, though the increase in gain is only small. The Modified Block Adaptive Coder applies a switching from pitch precitzion to spectrum prediction when the speech segment changes from a voiced or transition region to an unvoiced region. The experiments cont;-:ted in coding, transmission and simulation, used speech samples from .\£=_‘ajr2_1a:r1 and English phrases. Proposal for a speaker reecgnifion syste: and a phoneme identification system has also been outlized towards the end of the thesis.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The phenotype of partial trisomy 9p includes global developmental delay, microcephaly, bulbous nose, downturned oral commissures, malformed ears, hypotonia, and severe cognitive and language disorders. We present a case report and a comparative review of clinical findings on this condition, focusing on speech-language development, cognitive abilities and swallowing evaluation. We suggest that oropharyngeal dysphagia should be further investigated, considering that pulmonary and nutritional disorders affect the survival and quality of life of the patient. As far as we know, this is the first study of a patient with partial trisomy 9p described with oropharyngeal dysphagia.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Speech melody or prosody subserves linguistic, emotional, and pragmatic functions in speech communication. Prosodic perception is based on the decoding of acoustic cues with a predominant function of frequency-related information perceived as speaker's pitch. Evaluation of prosodic meaning is a cognitive function implemented in cortical and subcortical networks that generate continuously updated affective or linguistic speaker impressions. Various brain-imaging methods allow delineation of neural structures involved in prosody processing. In contrast to functional magnetic resonance imaging techniques, DC (direct current, slow) components of the EEG directly measure cortical activation without temporal delay. Activation patterns obtained with this method are highly task specific and intraindividually reproducible. Studies presented here investigated the topography of prosodic stimulus processing in dependence on acoustic stimulus structure and linguistic or affective task demands, respectively. Data obtained from measuring DC potentials demonstrated that the right hemisphere has a predominant role in processing emotions from the tone of voice, irrespective of emotional valence. However, right hemisphere involvement is modulated by diverse speech and language-related conditions that are associated with a left hemisphere participation in prosody processing. The degree of left hemisphere involvement depends on several factors such as (i) articulatory demands on the perceiver of prosody (possibly, also the poser), (ii) a relative left hemisphere specialization in processing temporal cues mediating prosodic meaning, and (iii) the propensity of prosody to act on the segment level in order to modulate word or sentence meaning. The specific role of top-down effects in terms of either linguistically or affectively oriented attention on lateralization of stimulus processing is not clear and requires further investigations.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

OBJECTIVE To analyze speech reading through Internet video calls by profoundly hearing-impaired individuals and cochlear implant (CI) users. METHODS Speech reading skills of 14 deaf adults and 21 CI users were assessed using the Hochmair Schulz Moser (HSM) sentence test. We presented video simulations using different video resolutions (1280 × 720, 640 × 480, 320 × 240, 160 × 120 px), frame rates (30, 20, 10, 7, 5 frames per second (fps)), speech velocities (three different speakers), webcameras (Logitech Pro9000, C600 and C500) and image/sound delays (0-500 ms). All video simulations were presented with and without sound and in two screen sizes. Additionally, scores for live Skype™ video connection and live face-to-face communication were assessed. RESULTS Higher frame rate (>7 fps), higher camera resolution (>640 × 480 px) and shorter picture/sound delay (<100 ms) were associated with increased speech perception scores. Scores were strongly dependent on the speaker but were not influenced by physical properties of the camera optics or the full screen mode. There is a significant median gain of +8.5%pts (p = 0.009) in speech perception for all 21 CI-users if visual cues are additionally shown. CI users with poor open set speech perception scores (n = 11) showed the greatest benefit under combined audio-visual presentation (median speech perception +11.8%pts, p = 0.032). CONCLUSION Webcameras have the potential to improve telecommunication of hearing-impaired individuals.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

OBJECTIVES The objectives of the present study were to investigate temporal/spectral sound-feature processing in preschool children (4 to 7 years old) with peripheral hearing loss compared with age-matched controls. The results verified the presence of statistical learning, which was diminished in children with hearing impairments (HIs), and elucidated possible perceptual mediators of speech production. DESIGN Perception and production of the syllables /ba/, /da/, /ta/, and /na/ were recorded in 13 children with normal hearing and 13 children with HI. Perception was assessed physiologically through event-related potentials (ERPs) recorded by EEG in a multifeature mismatch negativity paradigm and behaviorally through a discrimination task. Temporal and spectral features of the ERPs during speech perception were analyzed, and speech production was quantitatively evaluated using speech motor maximum performance tasks. RESULTS Proximal to stimulus onset, children with HI displayed a difference in map topography, indicating diminished statistical learning. In later ERP components, children with HI exhibited reduced amplitudes in the N2 and early parts of the late disciminative negativity components specifically, which are associated with temporal and spectral control mechanisms. Abnormalities of speech perception were only subtly reflected in speech production, as the lone difference found in speech production studies was a mild delay in regulating speech intensity. CONCLUSIONS In addition to previously reported deficits of sound-feature discriminations, the present study results reflect diminished statistical learning in children with HI, which plays an early and important, but so far neglected, role in phonological processing. Furthermore, the lack of corresponding behavioral abnormalities in speech production implies that impaired perceptual capacities do not necessarily translate into productive deficits.