7 resultados para Speech Dialog Systems
em Aston University Research Archive
Resumo:
The present thesis focuses on the overall structure of the language of two types of Speech Exchange Systems (SES) : Interview (INT) and Conversation (CON). The linguistic structure of INT and CON are quantitatively investigated on three different but interrelated levels of analysis : Lexis, Syntax and Information Structure. The corpus of data 1n vest1gated for the project consists of eight sessions of pairs of conversants in carefully planned interviews followed by unplanned, surreptitiously recorded conversational encounters of the same pairs of speakers. The data comprise a total of approximately 15.200 words of INT talk and of about 19.200 words in CON. Taking account of the debatable assumption that the language of SES might be complex on certain linguistic levels (e.g. syntax) (Halliday 1979) and might be simple on others (e.g. lexis) in comparison to written discourse, the thesis sets out to investigate this complexity using a statistical approach to the computation of the structures recurrent in the language of INT and CON. The findings indicate clearly the presence of linguistic complexity in both types. They also show the language of INT to be slightly more syntactically and lexically complex than that of CON. Lexical density seems to be relatively high in both types of spoken discourse. The language of INT seems to be more complex than that of CON on the level of information structure too. This is manifested in the greater use of Inferable and other linguistically complex entities of discourse. Halliday's suggestion that the language of SES is syntactically complex is confirmed but not the one that the more casual the conversation is the more syntactically complex it becomes. The results of the analysis point to the general conclusion that the linguistic complexity of types of SES is not only in the high recurrence of syntactic structures, but also in the combination of these features with each other and with other linguistic and extralinguistic features. The linguistic analysis of the language of SES can be useful in understanding and pinpointing the intricacies of spoken discourse in general and will help discourse analysts and applied linguists in exploiting it both for theoretical and pedagogical purposes.
Resumo:
This thesis addresses the viability of automatic speech recognition for control room systems; with careful system design, automatic speech recognition (ASR) devices can be useful means for human computer interaction in specific types of task. These tasks can be defined as complex verbal activities, such as command and control, and can be paired with spatial tasks, such as monitoring, without detriment. It is suggested that ASR use be confined to routine plant operation, as opposed the critical incidents, due to possible problems of stress on the operators' speech. Â It is proposed that using ASR will require operators to adapt a commonly used skill to cater for a novel use of speech. Before using the ASR device, new operators will require some form of training. It is shown that a demonstration by an experienced user of the device can lead to superior performance than instructions. Thus, a relatively cheap and very efficient form of operator training can be supplied by demonstration by experienced ASR operators. From a series of studies into speech based interaction with computers, it is concluded that the interaction be designed to capitalise upon the tendency of operators to use short, succinct, task specific styles of speech. From studies comparing different types of feedback, it is concluded that operators be given screen based feedback, rather than auditory feedback, for control room operation. Feedback will take two forms: the use of the ASR device will require recognition feedback, which will be best supplied using text; the performance of a process control task will require task feedback integrated into the mimic display. This latter feedback can be either textual or symbolic, but it is suggested that symbolic feedback will be more beneficial. Related to both interaction style and feedback is the issue of handling recognition errors. These should be corrected by simple command repetition practices, rather than use error handling dialogues. This method of error correction is held to be non intrusive to primary command and control operations. This thesis also addresses some of the problems of user error in ASR use, and provides a number of recommendations for its reduction.
Resumo:
This paper examines the connected speech process described by Wells (1982b) as the T to R rule in the West Midlands speech variety associated with the Black Country. The T to R rule is well known as a linguistic marker of local varieties of the middle and far north of England. Less well understood is its position in the phonological systems of Midlands varieties. Varieties of the Midlands of England are underresearched in comparison with varieties of the north, and what is known about the application of the T to R rule in this transitional dialect area is correspondingly nebulous. This paper focuses on the Black Country area, and examines the possible outputs in the contexts which give rise to /t/ becoming [?] in local varieties of the north. I examine the written and spoken evidence which suggests that the T to R rule does indeed operate in the Black Country variety. My analysis focuses on possible phonetic outcomes of the T to R rule across time. In my conclusion, I discuss briefly the possibility that, lying on a bundle of isoglosses separating north from south, the variety of the Black Country reflects this in that a T to [?] rule, rather than a T to R rule, is the dominant output of this connected speech process in the Black Country.
Resumo:
Cellular mobile radio systems will be of increasing importance in the future. This thesis describes research work concerned with the teletraffic capacity and the canputer control requirements of such systems. The work involves theoretical analysis and experimental investigations using digital computer simulation. New formulas are derived for the congestion in single-cell systems in which there are both land-to-mobile and mobile-to-mobile calls and in which mobile-to-mobile calls go via the base station. Two approaches are used, the first yields modified forms of the familiar Erlang and Engset formulas, while the second gives more complicated but more accurate formulas. The results of computer simulations to establish the accuracy of the formulas are described. New teletraffic formulas are also derived for the congestion in multi -cell systems. Fixed, dynamic and hybrid channel assignments are considered. The formulas agree with previously published simulation results. Simulation programs are described for the evaluation of the speech traffic of mobiles and for the investigation of a possible computer network for the control of the speech traffic. The programs were developed according to the structured progranming approach leading to programs of modular construction. Two simulation methods are used for the speech traffic: the roulette method and the time-true method. The first is economical but has some restriction, while the second is expensive but gives comprehensive answers. The proposed control network operates at three hierarchical levels performing various control functions which include: the setting-up and clearing-down of calls, the hand-over of calls between cells and the address-changing of mobiles travelling between cities. The results demonstrate the feasibility of the control netwvork and indicate that small mini -computers inter-connected via voice grade data channels would be capable of providing satisfactory control
Resumo:
This thesis describes work undertaken in order to fulfil a need experienced in the Department of Educational Enquiry at the University of Aston in Birmingham for speech analysis facilities suitable for use in teaching and research work within the Department. The hardware and software developed during the research project provides displays of speech fundamental frequency and intensity in real time. The system is suitable for the provision of visual feedback of these parameters of a subject's speech in a learning situation, and overcomes the inadequacies of equipment currently used for this task in that it provides a clear indication of fundamental frequency contours as the subject is speaking. The thesis considers the use of such equipment in several related fields, and the approaches that have been reported to one of the major problems of speech analysis, namely pitch-period estimation. A number of different systems are described, and their suitability for the present purposes is discussed. Finally, a novel method of pitch-period estimation is developed, and a speech analysis system incorporating this method is described. Comparison is made between the results produced by this system and those produced by a conventional speech spectrograph.
Resumo:
The research presented in this paper is part of an ongoing investigation into how best to incorporate speech-based input within mobile data collection applications. In our previous work [1], we evaluated the ability of a single speech recognition engine to support accurate, mobile, speech-based data input. Here, we build on our previous research to compare the achievable speaker-independent accuracy rates of a variety of speech recognition engines; we also consider the relative effectiveness of different speech recognition engine and microphone pairings in terms of their ability to support accurate text entry under realistic mobile conditions of use. Our intent is to provide some initial empirical data derived from mobile, user-based evaluations to support technological decisions faced by developers of mobile applications that would benefit from, or require, speech-based data entry facilities.
Resumo:
The research presented in this paper is part of an ongoing investigation into how best to incorporate speech-based input within mobile data collection applications. In our previous work [1], we evaluated the ability of a single speech recognition engine to support accurate, mobile, speech-based data input. Here, we build on our previous research to compare the achievable speaker-independent accuracy rates of a variety of speech recognition engines; we also consider the relative effectiveness of different speech recognition engine and microphone pairings in terms of their ability to support accurate text entry under realistic mobile conditions of use. Our intent is to provide some initial empirical data derived from mobile, user-based evaluations to support technological decisions faced by developers of mobile applications that would benefit from, or require, speech-based data entry facilities.