996 resultados para Speech acoustics
Resumo:
We propose apractical, feature-level and score-level fusion approach by combining acoustic and estimated articulatory information for both text independent and text dependent speaker verification. From a practical point of view, we study how to improve speaker verification performance by combining dynamic articulatory information with the conventional acoustic features. On text independent speaker verification, we find that concatenating articulatory features obtained from measured speech production data with conventional Mel-frequency cepstral coefficients (MFCCs) improves the performance dramatically. However, since directly measuring articulatory data is not feasible in many real world applications, we also experiment with estimated articulatory features obtained through acoustic-to-articulatory inversion. We explore both feature level and score level fusion methods and find that the overall system performance is significantly enhanced even with estimated articulatory features. Such a performance boost could be due to the inter-speaker variation information embedded in the estimated articulatory features. Since the dynamics of articulation contain important information, we included inverted articulatory trajectories in text dependent speaker verification. We demonstrate that the articulatory constraints introduced by inverted articulatory features help to reject wrong password trials and improve the performance after score level fusion. We evaluate the proposed methods on the X-ray Microbeam database and the RSR 2015 database, respectively, for the aforementioned two tasks. Experimental results show that we achieve more than 15% relative equal error rate reduction for both speaker verification tasks. (C) 2015 Elsevier Ltd. All rights reserved.
Resumo:
The affective impact of music arises from a variety of factors, including intensity, tempo, rhythm, and tonal relationships. The emotional coloring evoked by intensity, tempo, and rhythm appears to arise from association with the characteristics of human behavior in the corresponding condition; however, how and why particular tonal relationships in music convey distinct emotional effects are not clear. The hypothesis examined here is that major and minor tone collections elicit different affective reactions because their spectra are similar to the spectra of voiced speech uttered in different emotional states. To evaluate this possibility the spectra of the intervals that distinguish major and minor music were compared to the spectra of voiced segments in excited and subdued speech using fundamental frequency and frequency ratios as measures. Consistent with the hypothesis, the spectra of major intervals are more similar to spectra found in excited speech, whereas the spectra of particular minor intervals are more similar to the spectra of subdued speech. These results suggest that the characteristic affective impact of major and minor tone collections arises from associations routinely made between particular musical intervals and voiced speech.
Resumo:
The study of acoustic communication in animals often requires not only the recognition of species specific acoustic signals but also the identification of individual subjects, all in a complex acoustic background. Moreover, when very long recordings are to be analyzed, automatic recognition and identification processes are invaluable tools to extract the relevant biological information. A pattern recognition methodology based on hidden Markov models is presented inspired by successful results obtained in the most widely known and complex acoustical communication signal: human speech. This methodology was applied here for the first time to the detection and recognition of fish acoustic signals, specifically in a stream of round-the-clock recordings of Lusitanian toadfish (Halobatrachus didactylus) in their natural estuarine habitat. The results show that this methodology is able not only to detect the mating sounds (boatwhistles) but also to identify individual male toadfish, reaching an identification rate of ca. 95%. Moreover this method also proved to be a powerful tool to assess signal durations in large data sets. However, the system failed in recognizing other sound types.
Resumo:
Objective. To compare the voice performance of children involved in street labor with regular children using perceptual-auditory and acoustic analyses.Methods. A controlled cross-sectional study was carried out on 7- to 10-year-old children of both genders. Children from both groups lived with their families and attended school regularly; however, child labor was evident in one group and not the other. A total of 200 potentially eligible street children, assisted by the Child Labor Elimination Programme (PETI), and 400 regular children were interviewed. Those with any vocal discomfort (106, 53% and 90, 22.5%) had their voices assessed for resonance, pitch, loudness, speech rate, maximum phonation time, and other acoustic measurements.Results. A total of 106 street children (study group [SG]) and 90 regular children (control group [CG]) were evaluated. the SG group demonstrated higher oral and nasal resonance, reduced loudness, a lower pitch, and a slower speech rate than the CG. the maximum phonation time, fundamental frequency, and upper harmonics were higher in the SG than the CG. Jitter and shimmer were higher in the CG than the SG.Conclusion. Using perceptual-auditory and acoustic analyses, we determined that there were differences in voice performance between the two groups, with street children having better quality perceptual and acoustic vocal parameters than regular children. We believe that this is due to the procedures and activities performed by the Child Labor Elimination Program (PETI), which helps children to cope with their living conditions.
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
TEMA: análise acústica da fala. OBJETIVO: analisar acusticamente as substituições envolvendo o contraste entre /t/ e /k/ na fala de crianças em aquisição típica e desviante do contraste acima referido, a fim de identificar e quantificar a existência de contrastes encobertos. MÉTODO: foi elaborado um experimento de produção de fala que envolveu a repetição de palavras, que combinavam /t/ e /k/ com /a/ e /u/ na posição acentuada, por 9 crianças divididas em três grupos: crianças em processo de aquisição do contraste investigado (G1); crianças com transtorno fonológico (G2) e crianças com produções típicas (G3). Com o uso do software Praat, as produções foram editadas e analisadas de acordo com os seguintes parâmetros acústicos: características espectrais do burst; transição CV e características temporais. Os testes estatísticos utilizados foram ANOVA de Friedman e Manova. A significância estatística adotada foi menor que 0,05. RESULTADOS: tanto nas produções das crianças do G2 quanto nas produções das crianças do G1, detectamos, em grande medida (80% e 57,4%, respectivamente), a presença de contrastes encobertos nos erros de substituição das oclusivas investigadas. Adicionalmente, a análise acústica revelou diferenças em como as crianças utilizam as pistas fonético-acústicas para marcarem a distinção entre /t/ e /k/. CONCLUSÃO: muitas das substituições presentes da produção de fala de crianças em processo de aquisição típico e desviante tratam-se na verdade de contrastes fônicos encobertos. Além disso, o uso da análise acústica permitiu a detecção de diferenças sutis da produção da fala das crianças.
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
A fissura de palato, em associação à Sequência de Pierre Robin, pode favorecer o desenvolvimento de produções atípicas (compensatórias), na fala da criança, como é o caso da oclusiva glotal (golpe de glote) comumente observada em substituição aos sons oclusivos (vozeados ou não). No presente estudo, foi realizada a análise dos parâmetros fonético-acústicos da oclusiva glotal produzidas em /k/ e /g/ por uma criança do gênero feminino, com 5 anos, que apresentava fissura de palato reparada, associada à Sequência de Pierre Robin. Para isso, foram selecionadas seis palavras em que a oclusiva velar encontrava-se na posição inicial da palavra e combinada com as vogais /a/, /i/ e /u/ na posição acentuada. Foi ainda realizado julgamento perceptivo-auditivo por três fonoaudiólogos, que apresentou concordância quanto à presença da oclusiva glotal de 100% para ambas as relações (intra e inter-juízes). Na inspeção dos dados via espectrograma foi observada variabilidade dos parâmetros espectrais (burst e transição formântica) e essas variações também puderam ser computadas considerando as vogais separadamente. A análise estatística revelou diferença estatisticamente significante entre as duas consoantes velares (/k/ e /g/) nos parâmetros espectral (burst), temporal (VOT e duração relativa da oclusiva na palavra) e os relativos às características acústicas das vogais adjacentes às oclusivas (período estacionário de F3). Por fim, as características acústicas da oclusiva glotal sugeriram que a criança pode ter utilizado de estratégias para marcar contrastes fônicos na língua, ainda que os mesmos não tenham magnitude suficiente para serem resgatados auditivamente pelo ouvinte.
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
The use of prosthetic devices for correction of velopharyngeal insufficiency (VPI) is an alternative treatment for patients with conditions that preclude surgery and for those individuals with a hypofunctional velopharynx (HV) with a poor prognosis for the surgical repair of VPI. Understanding the role and measuring the outcome of prosthetic treatment of velopharyngeal dysfunction requires the use of tools that allow for documenting pre- and post-treatment outcomes. Experimental openings in speech bulbs have been used for simulating VPI in studies documenting changes in aerodynamic, acoustic and kinematics aspects of speech associated with the use of palatal prosthetic devices. The use of nasometry to document changes in speech associated with experimental openings in speech bulbs, however, has not been described in the literature. Objective: This single-subject study investigated nasalance and nasality at the presence of experimental openings drilled through the speech bulb of a patient with HV. Material and Methods: Nasometric recordings of the word "pato" were obtained under 4 velopharyngeal conditions: no-opening (control condition), no speech bulb, speech bulb with a 20 mm(2) opening, and speech bulb with 30 mm(2) opening. Five speech-language pathologists performed auditory-perceptual ratings while the subject read an oral passage under all conditions. Results: Kruskal-Wallis test showed significant difference among conditions (p=0.0002), with Scheffe post hoc test indicating difference from the no-opening condition. Conclusion: The changes in nasalance observed after drilling holes of known sizes in a speech bulb suggest that nasometry reflect changes in transfer of sound energy related to different sizes of velopharyngeal opening.
Resumo:
OBJETIVO: estudar o valor da freqüência fundamental e suas variações presentes no choro de dor de recém-nascidos. MÉTODOS: foram gravadas as emissões de 111 recém-nascidos de termo e saudáveis, com idade de 24 a 72 horas durante procedimento da punção venosa periférica. A análise acústica foi realizada por meio dos softwares VOXMETRIA 1.1 com extração do valor da freqüência fundamental e GRAM 5.7 para verificar a ocorrência de variações da freqüência fundamental como quebras, bitonalidade e freqüência hiperaguda. A escala de dor NIPS foi realizada no momento da punção. A análise estatística é descritiva com extração dos valores de média, desvio-padrão e freqüência de ocorrência dos eventos. RESULTADOS: os recém-nascidos apresentaram 100% de suas emissões com variações de freqüência, ou seja, quebras e bitonalidade. A freqüência hiperaguda foi encontrada em 34,2% dos recém-nascidos. CONCLUSÃO: por meio do choro, o recém-nascido comunica sua dor. A emissão de dor do recém-nascido é tensa e estridente, com freqüência fundamental aguda e variações encontradas no traçado espectrográfico, como quebras, bitonalidade e freqüência hiperaguda. Tais características são importantes para chamar a atenção do adulto no pronto atendimento ao recém-nascido e auxiliar na avaliação de dor durante um procedimento.
Resumo:
Spectrographic analysis of male actors' voices showed a cluster, the actor's formant (AF), which is related to the perception of good and projected voice quality. To date, similar phenomena have not been described in the voices of actresses. Therefore, the objective of the current investigation was to compare actresses' and nonactresses' voices through acoustic analysis to verify the existence of the AF cluster or the strategies used to produce the performing voice. Thirty actresses and 30 nonactresses volunteered as subjects in the present study. All subjects read a 40-second text at both habitual and loud levels. Praat (v.5.1) was then used to analyze equivalent sound pressure level (Leq), speaking fundamental frequency (SFF), and in the long-term average spectrum window, the difference between the amplitude level of the fundamental frequency and first formant (L1 - L0), the spectral tilt (alpha ratio), and the amplitude and frequency of the AF region. Significant differences between the groups, in both levels, were observed for SFF and L1 - L0, with actresses presenting lower values. There were no significant differences between groups for Leq or alpha ratio at either level. There was no evidence of an AF cluster in the actresses' voices. Voice projection for this group of actresses seemed to be mainly a result of a laryngeal setting instead of vocal tract resonances.
Resumo:
This study investigates the possible differences between actors' and nonactors' vocal projection strategies using acoustic and perceptual analyses. A total of 11 male actors and 10 male nonactors volunteered as subjects, reading an extended text sample in habitual, moderate, and loud levels. The samples were analyzed for sound pressure level (SPL), alpha ratio (difference between the average SPL of the 1-5 kHz region and the average SPL of the 50 Hz-1 kHz region), fundamental frequency (F0), and long-term average spectrum (LTAS). Through LTAS, the mean frequency of the first formant (171) range, the mean frequency of the actor's formant, the level differences between the F1 frequency region and the F0 region (L1-L0), and the level differences between the strongest peak at 0-1 kHz and that at 3-4 kHz were measured. Eight voice specialists evaluated perceptually the degree of projection, loudness, and tension in the samples. The actors had a greater alpha ratio, stronger level of the actor's formant range, and a higher degree of perceived projection and loudness in all loudness levels. SPL, however, did not differ significantly between the actors and nonactors, and no differences were found in the mean formant frequencies ranges. The alpha ratio and the relative level of the actor's formant range seemed to be related to the degree of perceived loudness. From the physiological point of view, a more favorable glottal setting' providing a higher glottal closing speed, may be characteristic of these actors' projected voices. So, the projected voices, in this group of actors, were more related to the glottic source than to the resonance of the vocal tract.
Resumo:
BACKGROUND: One of the great difficulties in evaluating a voice is the judgment of quality through the perceptual auditive analysis--although frequently used--, as it is influenced by socioeconomic and cultural aspects as well as individual preferences. Many are the adjectives and methods used in this assessment, especially because of the subjectivity involved in the process, leading to incompatibilities between listeners and difficulties in reaching a consensus on the use of this or that terminology. In such a context, the voice laboratory and more specifically the acoustic computerized analysis, has guided and complemented speech-language treatments. Among the several possibilities of spectrographic analysis, the (Long-Term Average Spectrum--LTAS) quantifies the quality of voices, pointing differences between gender, age, professional--spoken and sang--and dysphonic voices. The LTAS has been used a lot in researches that investigate voice. As it evidences the contribution of the glottic source and of resonance to the quality of voice, it provides objective parameters for the evaluation of this aspect which usually depends on our auditive perception. AIM: to demonstrate how LTAS can be applied in voice research and in the speech-language therapy practice, describing both the technical aspects required for the production and interpretation of results, and its limitations. CONCLUSION: The area of voice research has developed a lot in these last two decades especially because of the advent of the voice and speech laboratory. For this reason, the knowledge about the applicability of more tools for voice analysis, as the LTAS, as well as the existing need for more studies in this area, will most certainly contribute for the creation of new research areas not only in the field of professional voice but also in the field of therapy.
Resumo:
The present study aimed to compare elderly and young female voices in habitual and high intensity. The effect of increased intensity on the acoustic and perceptual parameters was assessed. Sound pressure level, fundamental frequency, jitter, shimmer, and harmonic to noise ratio were obtained at habitual and high intensity voice in a group of 30 elderly women and 30 young women. Perceptual assessment was also performed. Both groups demonstrated an increase in sound pressure level and fundamental frequency from habitual voice to high intensity voice. No differences were found between groups in any acoustic variables on samples recorded with habitual intensity level. No significant differences between groups were found in habitual intensity level for pitch, hoarseness, roughness, and breathiness. Asthenia and instability obtained significant higher values in elderly than young participants, whereas, the elderly demonstrated lower values for perceived tension and loudness than young subjects. Acoustic and perceptual measures do not demonstrate evident differences between elderly and young speakers in habitual intensity level. The parameters analyzed may lack the sensitivity necessary to detect differences in subjects with normal voices. Phonation with high intensity highlights differences between groups, especially in perceptual parameters. Therefore, high intensity should be included to compare elderly and young voice.