991 resultados para VOICE ANALYSIS
Resumo:
This study is part of an ongoing collaborative effort between the medical and the signal processing communities to promote research on applying standard Automatic Speech Recognition (ASR) techniques for the automatic diagnosis of patients with severe obstructive sleep apnoea (OSA). Early detection of severe apnoea cases is important so that patients can receive early treatment. Effective ASR-based detection could dramatically cut medical testing time. Working with a carefully designed speech database of healthy and apnoea subjects, we describe an acoustic search for distinctive apnoea voice characteristics. We also study abnormal nasalization in OSA patients by modelling vowels in nasal and nonnasal phonetic contexts using Gaussian Mixture Model (GMM) pattern recognition on speech spectra. Finally, we present experimental findings regarding the discriminative power of GMMs applied to severe apnoea detection. We have achieved an 81% correct classification rate, which is very promising and underpins the interest in this line of inquiry.
Resumo:
tThis paper deals with the potential and limitations of using voice and speech processing to detect Obstruc-tive Sleep Apnea (OSA). An extensive body of voice features has been extracted from patients whopresent various degrees of OSA as well as healthy controls. We analyse the utility of a reduced set offeatures for detecting OSA. We apply various feature selection and reduction schemes (statistical rank-ing, Genetic Algorithms, PCA, LDA) and compare various classifiers (Bayesian Classifiers, kNN, SupportVector Machines, neural networks, Adaboost). S-fold crossvalidation performed on 248 subjects showsthat in the extreme cases (that is, 127 controls and 121 patients with severe OSA) voice alone is able todiscriminate quite well between the presence and absence of OSA. However, this is not the case withmild OSA and healthy snoring patients where voice seems to play a secondary role. We found that thebest classification schemes are achieved using a Genetic Algorithm for feature selection/reduction.
Resumo:
A case study of vocal fold paralysis treatment is described with the help of the voice quality analysis application BioMet®Phon. The case corresponds to a description of a 40 - year old female patient who was diagnosed of vocal fold paralysis following a cardio - pulmonar intervention which required intubation for 8 days and posterior tracheotomy for 15 days. The patient presented breathy and asthenic phon ation, and dysphagia. Six main examinations were conducted during a full year period that the treatment lasted consisting in periodic reviews including video - endostroboscopy, voice analysis and breathing function monitoring. The phoniatrician treatment inc luded 20 sessions of vocal rehabilitation, followed by an intracordal infiltration with Radiesse 8 months after the rehabilitation treatment started followed by 6 sessions of rehabilitation more. The videondoscopy and the voicing quality analysis refer a s ubstantial improvement in the vocal function with recovery in all the measures estimated (jitter, shimmer, mucosal wave contents, glottal closure, harmonic contents and biomechanical function analysis). The paper refers the procedure followed and the results obtained by comparing the longitudinal progression of the treatment, illustrating the utility of voice quality analysis tools in speech therapy.
Resumo:
Perceptual voice analysis is a subjective process. However, despite reports of varying degrees of intrajudge and interjudge reliability, it is widely used in clinical voice evaluation. One of the ways to improve the reliability of this procedure is to provide judges with signals as external standards so that comparison can be made in relation to these anchor signals. The present study used a Klatt speech synthesizer to create a set of speech signals with varying degree of three different voice qualities based on a Cantonese sentence. The primary objective of the study was to determine whether different abnormal voice qualities could be synthesized using the built-in synthesis parameters using a perceptual study. The second objective was to determine the relationship between acoustic characteristics of the synthesized signals and perceptual judgment. Twenty Cantonese-speaking speech pathologists with at least three years of clinical experience in perceptual voice evaluation were asked to undertake two tasks. The first was to decide whether the voice quality of the synthesized signals was normal or not. The second was to decide whether the abnormal signals should be described as rough, breathy, or vocal fry. The results showed that signals generated with a small degree of aspiration noise were perceived as breathiness while signals with a small degree of flutter or double pulsing were perceived as roughness. When the flutter or double pulsing increased further, tremor and vocal fry, rather than roughness, were perceived. Furthermore, the amount of aspiration noise, flutter, or double pulsing required for male voice stimuli was different from that required for the female voice stimuli with a similar level of perceptual breathiness and roughness. These findings showed that changes in perceived vocal quality could be achieved by systematic modifications of synthesis parameters. This opens up the possibility of using synthesized voice signals as external standards or anchors to improve the reliability of clinical perceptual voice evaluation. (C) 2002 Acoustical Society of America.
Resumo:
Medical fields requires fast, simple and noninvasive methods of diagnostic techniques. Several methods are available and possible because of the growth of technology that provides the necessary means of collecting and processing signals. The present thesis details the work done in the field of voice signals. New methods of analysis have been developed to understand the complexity of voice signals, such as nonlinear dynamics aiming at the exploration of voice signals dynamic nature. The purpose of this thesis is to characterize complexities of pathological voice from healthy signals and to differentiate stuttering signals from healthy signals. Efficiency of various acoustic as well as non linear time series methods are analysed. Three groups of samples are used, one from healthy individuals, subjects with vocal pathologies and stuttering subjects. Individual vowels/ and a continuous speech data for the utterance of the sentence "iruvarum changatimaranu" the meaning in English is "Both are good friends" from Malayalam language are recorded using a microphone . The recorded audio are converted to digital signals and are subjected to analysis.Acoustic perturbation methods like fundamental frequency (FO), jitter, shimmer, Zero Crossing Rate(ZCR) were carried out and non linear measures like maximum lyapunov exponent(Lamda max), correlation dimension (D2), Kolmogorov exponent(K2), and a new measure of entropy viz., Permutation entropy (PE) are evaluated for all three groups of the subjects. Permutation Entropy is a nonlinear complexity measure which can efficiently distinguish regular and complex nature of any signal and extract information about the change in dynamics of the process by indicating sudden change in its value. The results shows that nonlinear dynamical methods seem to be a suitable technique for voice signal analysis, due to the chaotic component of the human voice. Permutation entropy is well suited due to its sensitivity to uncertainties, since the pathologies are characterized by an increase in the signal complexity and unpredictability. Pathological groups have higher entropy values compared to the normal group. The stuttering signals have lower entropy values compared to the normal signals.PE is effective in charaterising the level of improvement after two weeks of speech therapy in the case of stuttering subjects. PE is also effective in characterizing the dynamical difference between healthy and pathological subjects. This suggests that PE can improve and complement the recent voice analysis methods available for clinicians. The work establishes the application of the simple, inexpensive and fast algorithm of PE for diagnosis in vocal disorders and stuttering subjects.
Resumo:
Nowadays, noninvasive methods of diagnosis have increased due to demands of the population that requires fast, simple and painless exams. These methods have become possible because of the growth of technology that provides the necessary means of collecting and processing signals. New methods of analysis have been developed to understand the complexity of voice signals, such as nonlinear dynamics aiming at the exploration of voice signals dynamic nature. The purpose of this paper is to characterize healthy and pathological voice signals with the aid of relative entropy measures. Phase space reconstruction technique is also used as a way to select interesting regions of the signals. Three groups of samples were used, one from healthy individuals and the other two from people with nodule in the vocal fold and Reinke`s edema. All of them are recordings of sustained vowel /a/ from Brazilian Portuguese. The paper shows that nonlinear dynamical methods seem to be a suitable technique for voice signal analysis, due to the chaotic component of the human voice. Relative entropy is well suited due to its sensibility to uncertainties, since the pathologies are characterized by an increase in the signal complexity and unpredictability. The results showed that the pathological groups had higher entropy values in accordance with other vocal acoustic parameters presented. This suggests that these techniques may improve and complement the recent voice analysis methods available for clinicians. (C) 2008 Elsevier Inc. All rights reserved.
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Electroglottographic analysis of actresses and nonactresses' voices in different levels of intensity
Resumo:
Background: Previous studies with long-term average spectrum (LTAS) showed the importance of the glottal source for understanding the projected voices of actresses. In this study, electroglottographic (EGG) analysis was used to investigate the contribution of the glottal source to the projected voice, comparing actresses and nonactresses' voices, in different levels of intensity. Method: Thirty actresses and 30 nonactresses sustained vowels in habitual, moderate, and loud intensity levels. The EGG variables were contact quotient (CQ), closing quotient (QCQ), and opening quotient (QOQ). Other variables were sound pressure level (SPL) and fundamental frequency (F0). A KayPENTAX EGG was used. Variables were inputted in a general linear model. Results/Discussion: Actresses showed significantly higher values for SPL, in all levels, and both groups increased SPL significantly while changing from habitual to moderate and further to loud. There were no significant differences between groups for EGG quotients. There were significant differences between the levels only for F0 and CQ for both groups. Conclusion: SPL was significantly higher among actresses in all intensity levels, but in the EGG analysis, no differences were found. This apparently weak contribution of the glottal source in the supposedly projected voices of actresses, contrary to previous LTAS studies, might be because of a higher subglottal pressure or perhaps greater vocal tract contribution in SPL. Results from the present study suggest that trained subjects did not produce a significant higher SPL than untrained individuals by increasing the cost in terms of higher vocal fold collision and hence more impact stress. Future researches should explore the difference between trained and nontrained voices by aerodynamic measurements to evaluate the relationship between physiologic findings and the acoustic and EGG data. Moreover, further studies should consider both types of vocal tasks, sustained vowel and running speech, for both EGG and LTAS analysis. © 2013 The Voice Foundation.
Resumo:
Dysphonia is more prevalent in teachers than among the general population. The objective of this study was to analyze clinical, vocal, and videolaryngoscopical aspects in dysphonic teachers. Ninety dysphonic teachers were inquired about their voice, comorbidities, and work conditions. They underwent vocal auditory-perceptual evaluation (maximum phonation time and GRBASI scale), acoustic voice analysis, and videolaryngoscopy. The results were compared with a control group consisting of 90 dysphonic nonteachers, of similar gender and ages, and with professional activities excluding teaching and singing. In both groups, there were 85 women and five men (age range 31-50 years). In the controls, the majority of subjects worked in domestic activities, whereas the majority of teachers worked in primary (42.8%) and secondary school (37.7%). Teachers and controls reported, respectively: vocal abuse (76.7%; 37.8%), weekly hours of work between 21 and 40 years (72.2%; 80%), under 10 years of practice (36%; 23%), absenteeism (23%; 0%), sinonasal (66%; 20%) and gastroesophageal symptoms (44%; 22%), hoarseness (82%; 78%), throat clearing (70%; 62%), and phonatory effort (72%; 52%). In both groups, there were decreased values of maximum phonation time, impairment of the G parameter in the GRBASI scale (82%), decrease of F0 and increase of the rest of acoustic parameters. Nodules and laryngopharyngeal reflux were predominant in teachers; laryngopharyngeal reflux, polyps, and sulcus vocalis predominated in the controls. Vocal symptoms, comorbidities, and absenteeism were predominant among teachers. The vocal analyses were similar in both groups. Nodules and laryngopharyngeal reflux were predominant among teachers, whereas polyps, laryngopharyngeal reflux, and sulcus were predominant among controls.
Resumo:
Primary voice production occurs in the larynx through vibrational movements carried out by vocal folds. However, many problems can affect this complex system resulting in voice disorders. In this context, time-frequency-shape analysis based on embedding phase space plots and nonlinear dynamics methods have been used to evaluate the vocal fold dynamics during phonation. For this purpose, the present work used high-speed video to record the vocal fold movements of three subjects and extract the glottal area time series using an image segmentation algorithm. This signal is used for an optimization method which combines genetic algorithms and a quasi-Newton method to optimize the parameters of a biomechanical model of vocal folds based on lumped elements (masses, springs and dampers). After optimization, this model is capable of simulating the dynamics of recorded vocal folds and their glottal pulse. Bifurcation diagrams and phase space analysis were used to evaluate the behavior of this deterministic system in different circumstances. The results showed that this methodology can be used to extract some physiological parameters of vocal folds and reproduce some complex behaviors of these structures contributing to the scientific and clinical evaluation of voice production. (C) 2010 Elsevier Inc. All rights reserved.
Resumo:
The dramatic impact of neurological degenerative pathologies in life quality is a growing concern. It is well known that many neurological diseases leave a fingerprint in voice and speech production. Many techniques have been designed for the detection, diagnose and monitoring the neurological disease. Most of them are costly or difficult to extend to primary attention medical services. Through the present paper it will be shown how some neurological diseases can be traced at the level of phonation. The detection procedure would be based on a simple voice test. The availability of advanced tools and methodologies to monitor the organic pathology of voice would facilitate the implantation of these tests. The paper hypothesizes that some of the underlying mechanisms affecting the production of voice produce measurable correlates in vocal fold biomechanics. A general description of the methodological foundations for the voice analysis system which can estimate correlates to the neurological disease is shown. Some study cases will be presented to illustrate the possibilities of the methodology to monitor neurological diseases by voice
Resumo:
Student voice data is a key factor as Manchester Metropolitan University strives to continually improve institutional technology enhanced learning (TEL) infrastructure. A bi-annual Institutional Student Survey enables students to communicate their experience of learning, teaching and assessment on programmes and specific units studied. Each cycle of the survey contains approximately 40–50,000 free text comments from students pertaining to what they appreciate and what they would like to see improved. A detailed thematic analysis of this data has identified 18 themes, arranged into six categories relating to the ‘Best’ aspects of courses, and 25 themes, arranged in seven categories in relation to aspects of courses considered to be ‘in need of improvement’. This student data was then used as a basis for semi-structured interviews with staff. Anecdotally, evidence suggested that student expectations and staff expectations around TEL and the virtual learning environment (VLE) differed. On-going evaluation of this work has highlighted a disconnect. In significant instances, academic colleagues seemingly misinterpret the student voice analysis and consequently struggle to respond effectively. In response to the analysis, the learning technologist's role has been to re-interpret the analysis and redevelop TEL staff development and training activities. The changes implemented have focused on: contextualising resources in VLE; making lectures more interactive; enriching the curriculum with audio–visual resources; and setting expectations around communications.
Resumo:
OBJECTIVE To compare the effectiveness of two speech therapy interventions, vocal warm-up and breathing training, focusing on teachers’ voice quality.METHODS A single-blind, randomized, parallel clinical trial was conducted. The research included 31 20 to 60-year old teachers from a public school in Salvador, BA, Northeasatern Brazil, with minimum workloads of 20 hours a week, who have or have not reported having vocal alterations. The exclusion criteria were the following: being a smoker, excessive alcohol consumption, receiving additional speech therapy assistance while taking part in the study, being affected by upper respiratory tract infections, professional use of the voice in another activity, neurological disorders, and history of cardiopulmonary pathologies. The subjects were distributed through simple randomization in groups vocal warm-up (n = 14) and breathing training (n = 17). The teachers’ voice quality was subjectively evaluated through the Voice Handicap Index (Índice de Desvantagem Vocal, in the Brazilian version) and computerized voice analysis (average fundamental frequency, jitter, shimmer, noise, and glottal-to-noise excitation ratio) by speech therapists.RESULTS Before the interventions, the groups were similar regarding sociodemographic characteristics, teaching activities, and vocal quality. The variations before and after the intervention in self-assessment and acoustic voice indicators have not significantly differed between the groups. In the comparison between groups before and after the six-week interventions, significant reductions in the Voice Handicap Index of subjects in both groups were observed, as wells as reduced average fundamental frequencies in the vocal warm-up group and increased shimmer in the breathing training group. Subjects from the vocal warm-up group reported speaking more easily and having their voices more improved in a general way as compared to the breathing training group.CONCLUSIONS Both interventions were similar regarding their effects on the teachers’ voice quality. However, each contribution has individually contributed to improve the teachers’ voice quality, especially the vocal warm-up.TRIAL RECORD NCT02102399, “Vocal Warm-up and Respiratory Muscle Training in Teachers”.
Resumo:
Background: Voice processing in real-time is challenging. A drawback of previous work for Hypokinetic Dysarthria (HKD) recognition is the requirement of controlled settings in a laboratory environment. A personal digital assistant (PDA) has been developed for home assessment of PD patients. The PDA offers sound processing capabilities, which allow for developing a module for recognition and quantification HKD. Objective: To compose an algorithm for assessment of PD speech severity in the home environment based on a review synthesis. Methods: A two-tier review methodology is utilized. The first tier focuses on real-time problems in speech detection. In the second tier, acoustics features that are robust to medication changes in Levodopa-responsive patients are investigated for HKD recognition. Keywords such as Hypokinetic Dysarthria , and Speech recognition in real time were used in the search engines. IEEE explorer produced the most useful search hits as compared to Google Scholar, ELIN, EBRARY, PubMed and LIBRIS. Results: Vowel and consonant formants are the most relevant acoustic parameters to reflect PD medication changes. Since relevant speech segments (consonants and vowels) contains minority of speech energy, intelligibility can be improved by amplifying the voice signal using amplitude compression. Pause detection and peak to average power rate calculations for voice segmentation produce rich voice features in real time. Enhancements in voice segmentation can be done by inducing Zero-Crossing rate (ZCR). Consonants have high ZCR whereas vowels have low ZCR. Wavelet transform is found promising for voice analysis since it quantizes non-stationary voice signals over time-series using scale and translation parameters. In this way voice intelligibility in the waveforms can be analyzed in each time frame. Conclusions: This review evaluated HKD recognition algorithms to develop a tool for PD speech home-assessment using modern mobile technology. An algorithm that tackles realtime constraints in HKD recognition based on the review synthesis is proposed. We suggest that speech features may be further processed using wavelet transforms and used with a neural network for detection and quantification of speech anomalies related to PD. Based on this model, patients' speech can be automatically categorized according to UPDRS speech ratings.
Resumo:
The genus Herpsilochmus is composed mainly of cryptic species, among them is Herpsilochmus rufimarginatus, which is currently represented by four subspecies: H. r. rufimarginatus, H. r. frater, H. r. scapularis and H. r. exiguus. Differences in plumage and vocalization suggest that there are more than one species involved in this complex. Thus this and other subspecific taxa need urgent revision, the disjunct distribution of this species also allows us to infer the relationship between birds that occur in this biome and / or different centers of endemism. This study aims to make a taxonomic revision of the taxa included in the complex time Herpsilochmus rufimarginatus based on morphological, morphometric, vocals and geographical distribution of this bird. Besides creating distribution models current potential and make the reconstruction of the distribution bygone using ecological niche modeling, and testing the niche conservatism and divergence between different subspecies. Consultations for examination of the skins of specimens of the museums: Museum of Zoology, University of São Paulo (MZUSP), National Museum of Rio de Janeiro (MN) and Emilio Goeldi Museum of Pará (MPEG), and the skins deposited at the collection of Ornithological Federal University of Rio Grande do Norte (COUFRN). We studied the following measures length of specimens: exposed culmen, culmen and total culmen nostril, tarsus, wing and tail flattened. The voice analysis was performed with vocalizations banks and / or digital banks people where 17 voice parameters were measured. This information and more available in the literature were used to assemble a bunch of data under the limit distribution of taxa and generate ecological niche models. This analyzes carried out in the program Maxent, having as model selection criterion the AUC, and the models were greater than 0.80 are considered good models. Environmental data for the realization of the modeling were downloaded on the website of Worldclim. The morphometric information, vocals and geographic distribution point for the separation of these taxa to be considering various uni and multivariate analyzes. The potential distribution models performed well (AUC> 0.80), and its distribution associated with environmental characteristics of the Amazon forest and Atlantic forest (forests of south and southeast, northeast and forest). The reconstruction of the distribution indicates a possible contact between the southern part of the Atlantic forest in the northern part of the Amazon. The analysis of niche overlap showed a low overlap between taxa and comparisons between the null model and the generated overlay link probably occurring niche conservatism. The data suggest that the taxa that occur in the Amazon and Atlantic forest represent three distinct species