46 resultados para Freedom of speech.
Resumo:
This thesis addresses the viability of automatic speech recognition for control room systems; with careful system design, automatic speech recognition (ASR) devices can be useful means for human computer interaction in specific types of task. These tasks can be defined as complex verbal activities, such as command and control, and can be paired with spatial tasks, such as monitoring, without detriment. It is suggested that ASR use be confined to routine plant operation, as opposed the critical incidents, due to possible problems of stress on the operators' speech. It is proposed that using ASR will require operators to adapt a commonly used skill to cater for a novel use of speech. Before using the ASR device, new operators will require some form of training. It is shown that a demonstration by an experienced user of the device can lead to superior performance than instructions. Thus, a relatively cheap and very efficient form of operator training can be supplied by demonstration by experienced ASR operators. From a series of studies into speech based interaction with computers, it is concluded that the interaction be designed to capitalise upon the tendency of operators to use short, succinct, task specific styles of speech. From studies comparing different types of feedback, it is concluded that operators be given screen based feedback, rather than auditory feedback, for control room operation. Feedback will take two forms: the use of the ASR device will require recognition feedback, which will be best supplied using text; the performance of a process control task will require task feedback integrated into the mimic display. This latter feedback can be either textual or symbolic, but it is suggested that symbolic feedback will be more beneficial. Related to both interaction style and feedback is the issue of handling recognition errors. These should be corrected by simple command repetition practices, rather than use error handling dialogues. This method of error correction is held to be non intrusive to primary command and control operations. This thesis also addresses some of the problems of user error in ASR use, and provides a number of recommendations for its reduction.
Resumo:
At present there is no standard assessment method for rating and comparing the quality of synthesized speech. This study assesses the suitability of Time Frequency Warping (TFW) modulation for use as a reference device for assessing synthesized speech. Time Frequency Warping modulation introduces timing errors into natural speech that produce perceptual errors similar to those found in synthetic speech. It is proposed that TFW modulation used in conjunction with a listening effort test would provide a standard assessment method for rating the quality of synthesized speech. This study identifies the most suitable TFW modulation variable parameter to be used for assessing synthetic speech and assess the results of several assessment tests that rate examples of synthesized speech in terms of the TFW variable parameter and listening effort. The study also attempts to identify the attributes of speech that differentiate synthetic, TFW modulated and natural speech.
Resumo:
The research presented in this paper is part of an ongoing investigation into how best to incorporate speech-based input within mobile data collection applications. In our previous work [1], we evaluated the ability of a single speech recognition engine to support accurate, mobile, speech-based data input. Here, we build on our previous research to compare the achievable speaker-independent accuracy rates of a variety of speech recognition engines; we also consider the relative effectiveness of different speech recognition engine and microphone pairings in terms of their ability to support accurate text entry under realistic mobile conditions of use. Our intent is to provide some initial empirical data derived from mobile, user-based evaluations to support technological decisions faced by developers of mobile applications that would benefit from, or require, speech-based data entry facilities.
Resumo:
There has been considerable recent research into the connection between Parkinson's disease (PD) and speech impairment. Recently, a wide range of speech signal processing algorithms (dysphonia measures) aiming to predict PD symptom severity using speech signals have been introduced. In this paper, we test how accurately these novel algorithms can be used to discriminate PD subjects from healthy controls. In total, we compute 132 dysphonia measures from sustained vowels. Then, we select four parsimonious subsets of these dysphonia measures using four feature selection algorithms, and map these feature subsets to a binary classification response using two statistical classifiers: random forests and support vector machines. We use an existing database consisting of 263 samples from 43 subjects, and demonstrate that these new dysphonia measures can outperform state-of-the-art results, reaching almost 99% overall classification accuracy using only ten dysphonia features. We find that some of the recently proposed dysphonia measures complement existing algorithms in maximizing the ability of the classifiers to discriminate healthy controls from PD subjects. We see these results as an important step toward noninvasive diagnostic decision support in PD.
Resumo:
The research presented in this paper is part of an ongoing investigation into how best to incorporate speech-based input within mobile data collection applications. In our previous work [1], we evaluated the ability of a single speech recognition engine to support accurate, mobile, speech-based data input. Here, we build on our previous research to compare the achievable speaker-independent accuracy rates of a variety of speech recognition engines; we also consider the relative effectiveness of different speech recognition engine and microphone pairings in terms of their ability to support accurate text entry under realistic mobile conditions of use. Our intent is to provide some initial empirical data derived from mobile, user-based evaluations to support technological decisions faced by developers of mobile applications that would benefit from, or require, speech-based data entry facilities.
Resumo:
Purpose: Both phonological (speech) and auditory (non-speech) stimuli have been shown to predict early reading skills. However, previous studies have failed to control for the level of processing required by tasks administered across the two levels of stimuli. For example, phonological tasks typically tap explicit awareness e.g., phoneme deletion, while auditory tasks usually measure implicit awareness e.g., frequency discrimination. Therefore, the stronger predictive power of speech tasks may be due to their higher processing demands, rather than the nature of the stimuli. Method: The present study uses novel tasks that control for level of processing (isolation, repetition and deletion) across speech (phonemes and nonwords) and non-speech (tones) stimuli. 800 beginning readers at the onset of literacy tuition (mean age 4 years and 7 months) were assessed on the above tasks as well as word reading and letter-knowledge in the first part of a three time-point longitudinal study. Results: Time 1 results reveal a significantly higher association between letter-sound knowledge and all of the speech compared to non-speech tasks. Performance was better for phoneme than tone stimuli, and worse for deletion than isolation and repetition across all stimuli. Conclusions: Results are consistent with phonological accounts of reading and suggest that level of processing required by the task is less important than stimuli type in predicting the earliest stage of reading.
Resumo:
Despite being nominated as a key potential interaction technique for supporting today's mobile technology user, the widespread commercialisation of speech-based input is currently being impeded by unacceptable recognition error rates. Developing effective speech-based solutions for use in mobile contexts, given the varying extent of background noise, is challenging. The research presented in this paper is part of an ongoing investigation into how best to incorporate speechbased input within mobile data collection applications. Specifically, this paper reports on a comparison of three different commercially available microphones in terms of their efficacy to facilitate mobile, speech-based data entry. We describe, in detail, our novel evaluation design as well as the results we obtained.
Resumo:
It is well established that speech, language and phonological skills are closely associated with literacy, and that children with a family risk of dyslexia (FRD) tend to show deficits in each of these areas in the preschool years. This paper examines what the relationships are between FRD and these skills, and whether deficits in speech, language and phonological processing fully account for the increased risk of dyslexia in children with FRD. One hundred and fifty-three 4-6-year-old children, 44 of whom had FRD, completed a battery of speech, language, phonology and literacy tasks. Word reading and spelling were retested 6 months later, and text reading accuracy and reading comprehension were tested 3 years later. The children with FRD were at increased risk of developing difficulties in reading accuracy, but not reading comprehension. Four groups were compared: good and poor readers with and without FRD. In most cases good readers outperformed poor readers regardless of family history, but there was an effect of family history on naming and nonword repetition regardless of literacy outcome, suggesting a role for speech production skills as an endophenotype of dyslexia. Phonological processing predicted spelling, while language predicted text reading accuracy and comprehension. FRD was a significant additional predictor of reading and spelling after controlling for speech production, language and phonological processing, suggesting that children with FRD show additional difficulties in literacy that cannot be fully explained in terms of their language and phonological skills. It is well established that speech, language and phonological skills are closely associated with literacy, and that children with a family risk of dyslexia (FRD) tend to show deficits in each of these areas in the preschool years. This paper examines what the relationships are between FRD and these skills, and whether deficits in speech, language and phonological processing fully account for the increased risk of dyslexia in children with FRD. One hundred and fifty-three 4-6-year-old children, 44 of whom had FRD, completed a battery of speech, language, phonology and literacy tasks. © 2014 John Wiley & Sons Ltd.
Resumo:
In response to Chaski’s article (published in this volume) an examination is made of the methodological understanding necessary to identify dependable markers for forensic (and general) authorship attribution work. This examination concentrates on three methodological areas of concern which researchers intending to identify markers of authorship must address. These areas are sampling linguistic data, establishing the reliability of authorship markers and establishing the validity of authorship markers. It is suggested that the complexity of sampling problems in linguistic data is often underestimated and that theoretical issues in this area are both difficult and unresolved. It is further argued that the concepts of reliability and validity must be well understood and accounted for in any attempts to identify authorship markers and that largely this is not done. Finally, Principal Component Analysis is identified as an alternative approach which avoids some of the methodological problems inherent in identifying reliable, valid markers of authorship.
Resumo:
As wireless network technologies evolve towards an All-IP framework, Next Generation Wireless Communication Devices demand better use of spectral resources by employing advanced techniques of silence suppression. This paper presents an analysis of VoIP call data and compares the statistical results based on observed patterns of talk spurts and silence lengths to those achieved by a modified on-off voice model for silence suppression in wireless networks. As talk spurts and silence lengths are sensitive to varying word lengths, temporal structure and other prosodic aspects of speech, the impact of the use of various languages, dialects and gender of speakers on these results is also assessed.
Resumo:
This work attempts to create a systemic design framework for man-machine interfaces which is self consistent, compatible with other concepts, and applicable to real situations. This is tackled by examining the current architecture of computer applications packages. The treatment in the main is philosophical and theoretical and analyses the origins, assumptions and current practice of the design of applications packages. It proposes that the present form of packages is fundamentally contradictory to the notion of packaging itself. This is because as an indivisible ready-to-implement solution, current package architecture displays the following major disadvantages. First, it creates problems as a result of user-package interactions, in which the designer tries to mould all potential individual users, no matter how diverse they are, into one model. This is worsened by the minute provision, if any, of important properties such as flexibility, independence and impartiality. Second, it displays rigid structure that reduces the variety and/or multi-use of the component parts of such a package. Third, it dictates specific hardware and software configurations which probably results in reducing the number of degrees of freedom of its user. Fourth, it increases the dependence of its user upon its supplier through inadequate documentation and understanding of the package. Fifth, it tends to cause a degeneration of the expertise of design of the data processing practitioners. In view of this understanding an alternative methodological design framework which is both consistent with systems approach and the role of a package in its likely context is proposed. The proposition is based upon an extension of the identified concept of the hierarchy of holons* which facilitates the examination of the complex relationships of a package with its two principal environments. First, the user characteristics and his decision making practice and procedures; implying an examination of the user's M.I.S. network. Second, the software environment and its influence upon a package regarding support, control and operation of the package. The framework is built gradually as discussion advances around the central theme of a compatible M.I.S., software and model design. This leads to the formation of the alternative package architecture that is based upon the design of a number of independent, self-contained small parts. Such is believed to constitute the nucleus around which not only packages can be more effectively designed, but is also applicable to many man-machine systems design.
Resumo:
The present thesis investigates mode related aspects in biology lecture discourse and attempts to identify the position of this variety along the spontaneous spoken versus planned written language continuum. Nine lectures (of 43,000 words) consisting of three sets of three lectures each, given by the three lecturers at Aston University, make up the corpus. The indeterminacy of the results obtained from the investigation of grammatical complexity as measured in subordination motivates the need to take the analysis beyond sentence level to the study of mode related aspects in the use of sentence-initial connectives, sub-topic shifting and paraphrase. It is found that biology lecture discourse combines features typical of speech and writing at sentence as well as discourse level: thus, subordination is more used than co-ordination, but one degree complexity sentence is favoured; some sentence initial connectives are only found in uses typical of spoken language but sub-topic shift signalling (generally introduced by a connective) typical of planned written language is a major feature of the lectures; syntactic and lexical revision and repetition, interrupted structures are found in the sub-topic shift signalling utterance and paraphrase, but the text is also amenable to analysis into sentence like units. On the other hand, it is also found that: (1) while there are some differences in the use of a given feature, inter-speaker variation is on the whole not significant; (2) mode related aspects are often motivated by the didactic function of the variety; and (3) the structuring of the text follows a sequencing whose boundaries are marked by sub-topic shifting and the summary paraphrase. This study enables us to draw four theoretical conclusions: (1) mode related aspects cannot be approached as a simple dichotomy since a combination of aspects of both speech and writing are found in a given feature. It is necessary to go to the level of textual features to identify mode related aspects; (2) homogeneity is dominant in this sample of lectures which suggests that there is a high level of standardization in this variety; (3) the didactic function of the variety is manifested in some mode related aspects; (4) the features studied play a role in the structuring of the text.
Resumo:
The present thesis focuses on the overall structure of the language of two types of Speech Exchange Systems (SES) : Interview (INT) and Conversation (CON). The linguistic structure of INT and CON are quantitatively investigated on three different but interrelated levels of analysis : Lexis, Syntax and Information Structure. The corpus of data 1n vest1gated for the project consists of eight sessions of pairs of conversants in carefully planned interviews followed by unplanned, surreptitiously recorded conversational encounters of the same pairs of speakers. The data comprise a total of approximately 15.200 words of INT talk and of about 19.200 words in CON. Taking account of the debatable assumption that the language of SES might be complex on certain linguistic levels (e.g. syntax) (Halliday 1979) and might be simple on others (e.g. lexis) in comparison to written discourse, the thesis sets out to investigate this complexity using a statistical approach to the computation of the structures recurrent in the language of INT and CON. The findings indicate clearly the presence of linguistic complexity in both types. They also show the language of INT to be slightly more syntactically and lexically complex than that of CON. Lexical density seems to be relatively high in both types of spoken discourse. The language of INT seems to be more complex than that of CON on the level of information structure too. This is manifested in the greater use of Inferable and other linguistically complex entities of discourse. Halliday's suggestion that the language of SES is syntactically complex is confirmed but not the one that the more casual the conversation is the more syntactically complex it becomes. The results of the analysis point to the general conclusion that the linguistic complexity of types of SES is not only in the high recurrence of syntactic structures, but also in the combination of these features with each other and with other linguistic and extralinguistic features. The linguistic analysis of the language of SES can be useful in understanding and pinpointing the intricacies of spoken discourse in general and will help discourse analysts and applied linguists in exploiting it both for theoretical and pedagogical purposes.
Resumo:
The study examines factors influencing language planning decisions in contemporary France. It focuses upon the period 1992-1994, which witnessed the introduction of two major language policy measures, the first an amendment to the French Constitution, in 1992, proclaiming the language of the Republic as French, the second, in 1994, legislation to extend the ambit of the loi Bas-Lauriol, governing the use of the French language in France. The thesis posits a significant role for the pro-reform movement led by the French language association Avenir de la Langue Francaise (ALF) in the introduction and formulation of the policy measures concerned. The movement is depicted as continuing the traditional pattern of intellectual involvement in language planning, whilst also marking the beginning of a highly proactive, and increasingly political approach. Detailed examination of the movement's activities reveals that contextual factors and strategic strength combined to facilitate access to the levers of power, and enabled those involved to exert an impact on policy initiation, formulation, and ultimately implementation. However, ALF's decision to pursue the legislative route led to the expansion of the network of actors involved in language policymaking, and the development of counter-pressure from sectoral groups. It is suggested that this more interventionist approach destabilised the traditionally consensual language policy community, and called into question the quasi-monopoly of the intelligentsia in respect of language policymaking. It raised broader questions relating to freedom of expression and the permissible limits of language regulation in a democracy such as France. It also exposed ongoing ambiguities and inconsistencies in the interpretation of the tenets of language planning.
Resumo:
This thesis describes work undertaken in order to fulfil a need experienced in the Department of Educational Enquiry at the University of Aston in Birmingham for speech analysis facilities suitable for use in teaching and research work within the Department. The hardware and software developed during the research project provides displays of speech fundamental frequency and intensity in real time. The system is suitable for the provision of visual feedback of these parameters of a subject's speech in a learning situation, and overcomes the inadequacies of equipment currently used for this task in that it provides a clear indication of fundamental frequency contours as the subject is speaking. The thesis considers the use of such equipment in several related fields, and the approaches that have been reported to one of the major problems of speech analysis, namely pitch-period estimation. A number of different systems are described, and their suitability for the present purposes is discussed. Finally, a novel method of pitch-period estimation is developed, and a speech analysis system incorporating this method is described. Comparison is made between the results produced by this system and those produced by a conventional speech spectrograph.