807 resultados para Freedom of speech.
Resumo:
The aim of this thesis is to investigate computerized voice assessment methods to classify between the normal and Dysarthric speech signals. In this proposed system, computerized assessment methods equipped with signal processing and artificial intelligence techniques have been introduced. The sentences used for the measurement of inter-stress intervals (ISI) were read by each subject. These sentences were computed for comparisons between normal and impaired voice. Band pass filter has been used for the preprocessing of speech samples. Speech segmentation is performed using signal energy and spectral centroid to separate voiced and unvoiced areas in speech signal. Acoustic features are extracted from the LPC model and speech segments from each audio signal to find the anomalies. The speech features which have been assessed for classification are Energy Entropy, Zero crossing rate (ZCR), Spectral-Centroid, Mean Fundamental-Frequency (Meanf0), Jitter (RAP), Jitter (PPQ), and Shimmer (APQ). Naïve Bayes (NB) has been used for speech classification. For speech test-1 and test-2, 72% and 80% accuracies of classification between healthy and impaired speech samples have been achieved respectively using the NB. For speech test-3, 64% correct classification is achieved using the NB. The results direct the possibility of speech impairment classification in PD patients based on the clinical rating scale.
Resumo:
Background: Voice processing in real-time is challenging. A drawback of previous work for Hypokinetic Dysarthria (HKD) recognition is the requirement of controlled settings in a laboratory environment. A personal digital assistant (PDA) has been developed for home assessment of PD patients. The PDA offers sound processing capabilities, which allow for developing a module for recognition and quantification HKD. Objective: To compose an algorithm for assessment of PD speech severity in the home environment based on a review synthesis. Methods: A two-tier review methodology is utilized. The first tier focuses on real-time problems in speech detection. In the second tier, acoustics features that are robust to medication changes in Levodopa-responsive patients are investigated for HKD recognition. Keywords such as Hypokinetic Dysarthria , and Speech recognition in real time were used in the search engines. IEEE explorer produced the most useful search hits as compared to Google Scholar, ELIN, EBRARY, PubMed and LIBRIS. Results: Vowel and consonant formants are the most relevant acoustic parameters to reflect PD medication changes. Since relevant speech segments (consonants and vowels) contains minority of speech energy, intelligibility can be improved by amplifying the voice signal using amplitude compression. Pause detection and peak to average power rate calculations for voice segmentation produce rich voice features in real time. Enhancements in voice segmentation can be done by inducing Zero-Crossing rate (ZCR). Consonants have high ZCR whereas vowels have low ZCR. Wavelet transform is found promising for voice analysis since it quantizes non-stationary voice signals over time-series using scale and translation parameters. In this way voice intelligibility in the waveforms can be analyzed in each time frame. Conclusions: This review evaluated HKD recognition algorithms to develop a tool for PD speech home-assessment using modern mobile technology. An algorithm that tackles realtime constraints in HKD recognition based on the review synthesis is proposed. We suggest that speech features may be further processed using wavelet transforms and used with a neural network for detection and quantification of speech anomalies related to PD. Based on this model, patients' speech can be automatically categorized according to UPDRS speech ratings.
Resumo:
The purpose of this study was to determine the influence of hearing protection devices (HPDs) on the understanding of speech in young adults with normal hearing, both in a silent situation and in the presence of ambient noise. The experimental research was carried out with the following variables: five different conditions of HPD use (without protectors, with two earplugs and with two earmuffs); a type of noise (pink noise); 4 test levels (60, 70, 80 and 90 dB[A]); 6 signal/noise ratios (without noise, + 5, + 10, zero, - 5 and - 10 dB); 5 repetitions for each case, totalling 600 tests with 10 monosyllables in each one. The variable measure was the percentage of correctly heard words (monosyllabic) in the test. The results revealed that, at the lowest levels (60 and 70 dB), the protectors reduced the intelligibility of speech (compared to the tests without protectors) while, in the presence of ambient noise levels of 80 and 90 dB and unfavourable signal/noise ratios (0, -5 and -10 dB), the HPDs improved the intelligibility. A comparison of the effectiveness of earplugs versus earmuffs showed that the former offer greater efficiency in respect to the recognition of speech, providing a 30% improvement over situations in which no protection is used. As might be expected, this study confirmed that the protectors' influence on speech intelligibility is related directly to the spectral curve of the protector's attenuation. (C) 2003 Elsevier B.V. Ltd. All rights reserved.
Resumo:
Speech signals degraded by additive noise can affects different applications in telecommunication. The noise may degrades the intelligibility of the speech signals and its waveforms as well. In some applications such as speech coding, both intelligibility and waveform quality are important but only intelligibility has been focused lastly. So, modern speech quality measurement techniques such as PESQ (Perceptual Evaluation of Speech Quality) have been used and classical distortion measurement techniques such as Cepstral Distance are becoming unused. In this paper it is shown that some classical distortion measures are still important in applications where speech corrupted by additive noise has to be evaluated.
Resumo:
The granulomatous lesions are frequently founded in infectious diseases and can involve the larynx and pharynx and can cause varying degrees of dysphonia and dysphagia. There is still no systematic review that analyzes effectiveness of speech therapy in systemic granulomatous diseases. Research strategy: A systematic review was performed according to Cochrane guideline considering the inclusion of RCTs and quasi-RCTs about the effectiveness of speech-language therapy to treat dysphagia and dysphonia symptoms in systemic granulomatous diseases of the larynx and pharynx. Selection criteria: The outcome planned to be measured in this review were: swallowing impairment, frequency of chest infections and voice and swallowing symptoms. Data analysis: We identified 1,140 citations from all electronic databases. After an initial shift we only selected 9 titles to be retrieved in full-text. After full reading, there was no RCT found in this review and therefore, we only described the existing 2 case series studies. Results: There were no randomized controlled trials found in the literature. Therefore, two studies were selected to be included only for narratively analysis as they were case series. Conclusion: There is no evidence from high quality studies about the effectiveness of speech-language therapy in patients with granulomatous diseases of the larynx and pharynx. The investigators could rely in the outcomes suggested in this review to design their own clinical trials.
Resumo:
This study investigated the influence of top-down and bottom-up information on speech perception in complex listening environments. Specifically, the effects of listening to different types of processed speech were examined on intelligibility and on simultaneous visual-motor performance. The goal was to extend the generalizability of results in speech perception to environments outside of the laboratory. The effect of bottom-up information was evaluated with natural, cell phone and synthetic speech. The effect of simultaneous tasks was evaluated with concurrent visual-motor and memory tasks. Earlier works on the perception of speech during simultaneous visual-motor tasks have shown inconsistent results (Choi, 2004; Strayer & Johnston, 2001). In the present experiments, two dual-task paradigms were constructed in order to mimic non-laboratory listening environments. In the first two experiments, an auditory word repetition task was the primary task and a visual-motor task was the secondary task. Participants were presented with different kinds of speech in a background of multi-speaker babble and were asked to repeat the last word of every sentence while doing the simultaneous tracking task. Word accuracy and visual-motor task performance were measured. Taken together, the results of Experiments 1 and 2 showed that the intelligibility of natural speech was better than synthetic speech and that synthetic speech was better perceived than cell phone speech. The visual-motor methodology was found to demonstrate independent and supplemental information and provided a better understanding of the entire speech perception process. Experiment 3 was conducted to determine whether the automaticity of the tasks (Schneider & Shiffrin, 1977) helped to explain the results of the first two experiments. It was found that cell phone speech allowed better simultaneous pursuit rotor performance only at low intelligibility levels when participants ignored the listening task. Also, simultaneous task performance improved dramatically for natural speech when intelligibility was good. Overall, it could be concluded that knowledge of intelligibility alone is insufficient to characterize processing of different speech sources. Additional measures such as attentional demands and performance of simultaneous tasks were also important in characterizing the perception of different kinds of speech in complex listening environments.
Resumo:
This study investigated whether there are differences in the Speech-Evoked Auditory Brainstem Response among children with Typical Development (TD), (Central) Auditory Processing Disorder (C) APD, and Language Impairment (LI). The speech-evoked Auditory Brainstem Response was tested in 57 children (ages 6-12). The children were placed into three groups: TD (n = 18), (C)APD (n = 18) and LI (n = 21). Speech-evoked ABR were elicited using the five-formant syllable/da/. Three dimensions were defined for analysis, including timing, harmonics, and pitch. A comparative analysis of the responses between the typical development children and children with (C)APD and LI revealed abnormal encoding of the speech acoustic features that are characteristics of speech perception in children with (C)APD and LI, although the two groups differed in their abnormalities. While the children with (C)APD might had a greater difficulty distinguishing stimuli based on timing cues, the children with LI had the additional difficulty of distinguishing speech harmonics, which are important to the identification of speech sounds. These data suggested that an inefficient representation of crucial components of speech sounds may contribute to the difficulties with language processing found in children with LI. Furthermore, these findings may indicate that the neural processes mediated by the auditory brainstem differ among children with auditory processing and speech-language disorders. (C) 2012 Elsevier B.V. All rights reserved.
Resumo:
Background: Psychosis has various causes, including mania and schizophrenia. Since the differential diagnosis of psychosis is exclusively based on subjective assessments of oral interviews with patients, an objective quantification of the speech disturbances that characterize mania and schizophrenia is in order. In principle, such quantification could be achieved by the analysis of speech graphs. A graph represents a network with nodes connected by edges; in speech graphs, nodes correspond to words and edges correspond to semantic and grammatical relationships. Methodology/Principal Findings: To quantify speech differences related to psychosis, interviews with schizophrenics, manics and normal subjects were recorded and represented as graphs. Manics scored significantly higher than schizophrenics in ten graph measures. Psychopathological symptoms such as logorrhea, poor speech, and flight of thoughts were grasped by the analysis even when verbosity differences were discounted. Binary classifiers based on speech graph measures sorted schizophrenics from manics with up to 93.8% of sensitivity and 93.7% of specificity. In contrast, sorting based on the scores of two standard psychiatric scales (BPRS and PANSS) reached only 62.5% of sensitivity and specificity. Conclusions/Significance: The results demonstrate that alterations of the thought process manifested in the speech of psychotic patients can be objectively measured using graph-theoretical tools, developed to capture specific features of the normal and dysfunctional flow of thought, such as divergence and recurrence. The quantitative analysis of speech graphs is not redundant with standard psychometric scales but rather complementary, as it yields a very accurate sorting of schizophrenics and manics. Overall, the results point to automated psychiatric diagnosis based not on what is said, but on how it is said.
Resumo:
Introduction: In recent years, the benefits associated with the use of cochlear implants (CIs), especially with regard to speech perception, have proven to surpass those produced by the use of hearing aids, making CIs a highly efficient resource for patients with severe/profound hearing loss. However, few studies so far have assessed the satisfaction of adult users of CIs. Objective: To analyze the relationship between the level of speech perception and degree of satisfaction of adult users of CI. Method: This was a prospective cross-sectional study conducted in the Audiological Research Center (CPA) of the Hospital of Craniofacial Anomalies, University of São Paulo (HRAC/USP), in Bauru, São Paulo, Brazil. A total of 12 users of CIs with pre-lingual or post-lingual hearing loss participated in this study. The following tools were used in the assessment: a questionnaire, "Satisfaction with Amplification in Daily Life" (SADL), culturally adapted to Brazilian Portuguese, as well as its relationship with the speech perception results; a speech perception test under quiet conditions; and the Hearing in Noise Test (HINT)Brazil under free field conditions. Results: The participants in the study were on the whole satisfied with their devices, and the degree of satisfaction correlated positively with the ability to perceive monosyllabic words under quiet conditions. The satisfaction did not correlate with the level of speech perception in noisy environments. Conclusion: Assessments of satisfaction may help professionals to predict what other factors, in addition to speech perception, may contribute to the satisfaction of CI users in order to reorganize the intervention process to improve the users' quality of life.
Resumo:
A new idea for waveform coding using vector quantisation (VQ) is introduced. This idea makes it possible to deal with codevectors much larger than before for a fixed bit per sample rate. Also a solution to the matching problem (inherent in the present context) in the &-norm describing a measure of neamess is presented. The overall computational complexity of this solution is O(n3 log, n). Sample results are presented to demonstrate the advantage of using this technique in the context of coding of speech waveforms.
Resumo:
Internet has affected our lives and society in manifold ways, and partly, in fundamental ways. Therefore, it is no surprise that one of the affected areas is language and communication itself. Over the last few years, online social networks have become a widespread and continuously expanding medium of communication. Being a new medium of social interaction, online social networks produce their own communication style, which in many cases differs considerably from real speech and is also perceived differently. The focus of analysis of my PhD thesis is how social network users from the city of Malaga create this virtual style by means of phonic features typical of the Andalusian variety of Spanish and how the users’ language attitude has an influence on the use of these phonic features. The data collection was fourfold: 1) a main corpus was compiled from 240 informants’ utterances on Facebook and Tuenti; 2) a corpus constituted of broad transcriptions of recordings with 120 people from Malaga served as a comparison; 3) a survey in which 240 participants rated the use of said phonetic variants on the following axes: “good–bad”, “correct–incorrect” and “beautiful–ugly” was carried out; 4) a survey with 240 participants who estimated with which frequency the analysed features are used in Malaga was conducted. For the analysis, which is quantitative and qualitative, ten variables were chosen. Results show that the studied variants are employed differently in virtual and real speech depending on how people perceive these variants. In addition, the use of the features is constrained by social factors. In general, people from Malaga have a more positive attitude towards non-‐standard features if they are used in virtual speech than in real speech. Thus, virtual communication is seen as a style serving to create social meaning and to express linguistic identity. These stylistic practices reflect an amalgam of social presuppositions about usage conventions and individual strategies for handling a new medium. In sum, the virtual style is an initiative deliberately taken by the users, to create their, real and virtual, identities, and to define their language attitudes towards the features of their variety of speech.
Resumo:
Comprehending speech is one of the most important human behaviors, but we are only beginning to understand how the brain accomplishes this difficult task. One key to speech perception seems to be that the brain integrates the independent sources of information available in the auditory and visual modalities in a process known as multisensory integration. This allows speech perception to be accurate, even in environments in which one modality or the other is ambiguous in the context of noise. Previous electrophysiological and functional magnetic resonance imaging (fMRI) experiments have implicated the posterior superior temporal sulcus (STS) in auditory-visual integration of both speech and non-speech stimuli. While evidence from prior imaging studies have found increases in STS activity for audiovisual speech compared with unisensory auditory or visual speech, these studies do not provide a clear mechanism as to how the STS communicates with early sensory areas to integrate the two streams of information into a coherent audiovisual percept. Furthermore, it is currently unknown if the activity within the STS is directly correlated with strength of audiovisual perception. In order to better understand the cortical mechanisms that underlie audiovisual speech perception, we first studied the STS activity and connectivity during the perception of speech with auditory and visual components of varying intelligibility. By studying fMRI activity during these noisy audiovisual speech stimuli, we found that STS connectivity with auditory and visual cortical areas mirrored perception; when the information from one modality is unreliable and noisy, the STS interacts less with the cortex processing that modality and more with the cortex processing the reliable information. We next characterized the role of STS activity during a striking audiovisual speech illusion, the McGurk effect, to determine if activity within the STS predicts how strongly a person integrates auditory and visual speech information. Subjects with greater susceptibility to the McGurk effect exhibited stronger fMRI activation of the STS during perception of McGurk syllables, implying a direct correlation between strength of audiovisual integration of speech and activity within an the multisensory STS.
Resumo:
The New Cockney provides a sociolinguistic account of speech variation among adolescents in the 'traditional' East End of London. The study takes account of the social and economic upheaval in the area since the 1950s, primarily concentrating on factors such as the immigration of the Bangladeshi community and its effect on the Cockney dialect. By paying attention to the particular, this book contributes to a better understanding of the more general concerns of linguistic variation. With a focus on the interaction and social practices of a group of adolescents attending a youth centre, the study highlights some of the possible mechanisms for language change.
Resumo:
Public participation is an integral part of Environmental Impact Assessment (EIA), and as such, has been incorporated into regulatory norms. Assessment of the effectiveness of public participation has remained elusive however. This is partly due to the difficulty in identifying appropriate effectiveness criteria. This research uses Q methodology to discover and analyze stakeholder's social perspectives of the effectiveness of EIAs in the Western Cape, South Africa. It considers two case studies (Main Road and Saldanha Bay EIAs) for contextual participant perspectives of the effectiveness based on their experience. It further considers the more general opinion of provincial consent regulator staff at the Department of Environmental Affairs and the Department of Planning (DEA&DP). Two main themes of investigation are drawn from the South African National Environmental Management Act imperative for effectiveness: firstly, the participation procedure, and secondly, the stakeholder capabilities necessary for effective participation. Four theoretical frameworks drawn from planning, politics and EIA theory are adapted to public participation and used to triangulate the analysis and discussion of the revealed social perspectives. They consider citizen power in deliberation, Habermas' preconditions for the Ideal Speech Situation (ISS), a Foucauldian perspective of knowledge, power and politics, and a Capabilities Approach to public participation effectiveness. The empirical evidence from this research shows that the capacity and contextual constraints faced by participants demand the legislative imperatives for effective participation set out in the NEMA. The implementation of effective public participation has been shown to be a complex, dynamic and sometimes nebulous practice. The functional level of participant understanding of the process was found to be significantly wide-ranging with consequences of unequal and dissatisfied stakeholder engagements. Furthermore, the considerable variance of stakeholder capabilities in the South African social context, resulted in inequalities in deliberation. The social perspectives revealed significant differences in participant experience in terms of citizen power in deliberation. The ISS preconditions are highly contested in both the Saldanha EIA case study and the DEA&DP social perspectives. Only one Main Road EIA case study social perspective considered Foucault's notion of governmentality as a reality in EIA public participation. The freedom of control of ones environment, based on a Capabilities approach, is a highly contested notion. Although agreed with in principle, all of the social perspectives indicate that contextual and capacity realities constrain its realisation. This research has shown that Q method can be applied to EIA public participation in South Africa and, with the appropriate research or monitoring applications it could serve as a useful feedback tool to inform best practice public participation.
Resumo:
In the last two decades, there has been an important increase in research on speech technology in Spain, mainly due to a higher level of funding from European, Spanish and local institutions and also due to a growing interest in these technologies for developing new services and applications. This paper provides a review of the main areas of speech technology addressed by research groups in Spain, their main contributions in the recent years and the main focus of interest these days. This description is classified in five main areas: audio processing including speech, speaker characterization, speech and language processing, text to speech conversion and spoken language applications. This paper also introduces the Spanish Network of Speech Technologies (RTTH. Red Temática en Tecnologías del Habla) as the research network that includes almost all the researchers working in this area, presenting some figures, its objectives and its main activities developed in the last years.