336 resultados para Speech interaction
em Queensland University of Technology - ePrints Archive
Resumo:
Speech recognition in car environments has been identified as a valuable means for reducing driver distraction when operating noncritical in-car systems. Under such conditions, however, speech recognition accuracy degrades significantly, and techniques such as speech enhancement are required to improve these accuracies. Likelihood-maximizing (LIMA) frameworks optimize speech enhancement algorithms based on recognized state sequences rather than traditional signal-level criteria such as maximizing signal-to-noise ratio. LIMA frameworks typically require calibration utterances to generate optimized enhancement parameters that are used for all subsequent utterances. Under such a scheme, suboptimal recognition performance occurs in noise conditions that are significantly different from that present during the calibration session – a serious problem in rapidly changing noise environments out on the open road. In this chapter, we propose a dialog-based design that allows regular optimization iterations in order to track the ever-changing noise conditions. Experiments using Mel-filterbank noise subtraction (MFNS) are performed to determine the optimization requirements for vehicular environments and show that minimal optimization is required to improve speech recognition, avoid over-optimization, and ultimately assist with semireal-time operation. It is also shown that the proposed design is able to provide improved recognition performance over frameworks incorporating a calibration session only.
Resumo:
In this paper, cognitive load analysis via acoustic- and CAN-Bus-based driver performance metrics is employed to assess two different commercial speech dialog systems (SDS) during in-vehicle use. Several metrics are proposed to measure increases in stress, distraction and cognitive load and we compare these measures with statistical analysis of the speech recognition component of each SDS. It is found that care must be taken when designing an SDS as it may increase cognitive load which can be observed through increased speech response delay (SRD), changes in speech production due to negative emotion towards the SDS, and decreased driving performance on lateral control tasks. From this study, guidelines are presented for designing systems which are to be used in vehicular environments.
Resumo:
Secondary tasks such as cell phone calls or interaction with automated speech dialog systems (SDSs) increase the driver’s cognitive load as well as the probability of driving errors. This study analyzes speech production variations due to cognitive load and emotional state of drivers in real driving conditions. Speech samples were acquired from 24 female and 17 male subjects (approximately 8.5 h of data) while talking to a co-driver and communicating with two automated call centers, with emotional states (neutral, negative) and the number of necessary SDS query repetitions also labeled. A consistent shift in a number of speech production parameters (pitch, first format center frequency, spectral center of gravity, spectral energy spread, and duration of voiced segments) was observed when comparing SDS interaction against co-driver interaction; further increases were observed when considering negative emotion segments and the number of requested SDS query repetitions. A mel frequency cepstral coefficient based Gaussian mixture classifier trained on 10 male and 10 female sessions provided 91% accuracy in the open test set task of distinguishing co-driver interactions from SDS interactions, suggesting—together with the acoustic analysis—that it is possible to monitor the level of driver distraction directly from their speech.
Resumo:
The purpose of this chapter is to describe the use of caricatured contrasting scenarios (Bødker, 2000) and how they can be used to consider potential designs for disruptive technologies. The disruptive technology in this case is Automatic Speech Recognition (ASR) software in workplace settings. The particular workplace is the Magistrates Court of the Australian Capital Territory.----- Caricatured contrasting scenarios are ideally suited to exploring how ASR might be implemented in a particular setting because they allow potential implementations to be “sketched” quickly and with little effort. This sketching of potential interactions and the emphasis of both positive and negative outcomes allows the benefits and pitfalls of design decisions to become apparent.----- A brief description of the Court is given, describing the reasons for choosing the Court for this case study. The work of the Court is framed as taking place in two modes: Front of house, where the courtroom itself is, and backstage, where documents are processed and the business of the court is recorded and encoded into various systems.----- Caricatured contrasting scenarios describing the introduction of ASR to the front of house are presented and then analysed. These scenarios show that the introduction of ASR to the court would be highly problematic.----- The final section describes how ASR could be re-imagined in order to make it useful for the court. A final scenario is presented that describes how this re-imagined ASR could be integrated into both the front of house and backstage of the court in a way that could strengthen both processes.
Resumo:
Non-driving related cognitive load and variations of emotional state may impact a driver’s capability to control a vehicle and introduces driving errors. Availability of reliable cognitive load and emotion detection in drivers would benefit the design of active safety systems and other intelligent in-vehicle interfaces. In this study, speech produced by 68 subjects while driving in urban areas is analyzed. A particular focus is on speech production differences in two secondary cognitive tasks, interactions with a co-driver and calls to automated spoken dialog systems (SDS), and two emotional states during the SDS interactions - neutral/negative. A number of speech parameters are found to vary across the cognitive/emotion classes. Suitability of selected cepstral- and production-based features for automatic cognitive task/emotion classification is investigated. A fusion of GMM/SVM classifiers yields an accuracy of 94.3% in cognitive task and 81.3% in emotion classification.
Resumo:
In this paper, we present a microphone array beamforming approach to blind speech separation. Unlike previous beamforming approaches, our system does not require a-priori knowledge of the microphone placement and speaker location, making the system directly comparable other blind source separation methods which require no prior knowledge of recording conditions. Microphone location is automatically estimated using an assumed noise field model, and speaker locations are estimated using cross correlation based methods. The system is evaluated on the data provided for the PASCAL Speech Separation Challenge 2 (SSC2), achieving a word error rate of 58% on the evaluation set.
Resumo:
Sound tagging has been studied for years. Among all sound types, music, speech, and environmental sound are three hottest research areas. This survey aims to provide an overview about the state-of-the-art development in these areas.We discuss about the meaning of tagging in different sound areas at the beginning of the journey. Some examples of sound tagging applications are introduced in order to illustrate the significance of this research. Typical tagging techniques include manual, automatic, and semi-automatic approaches.After reviewing work in music, speech and environmental sound tagging, we compare them and state the research progress to date. Research gaps are identified for each research area and the common features and discriminations between three areas are discovered as well. Published datasets, tools used by researchers, and evaluation measures frequently applied in the analysis are listed. In the end, we summarise the worldwide distribution of countries dedicated to sound tagging research for years.
Resumo:
This study investigates the development of teacher identity in a transnational context through an analysis of the voices of sixteen preservice teachers from Hong Kong who engage in interaction with primary students in an Australian classroom. The context for this research is the school-based experience undertaken by these preservice English as a second language teachers as part of their short language immersion (SLIM) program in Brisbane, Australia. Such SLIM programs are a genre of study abroad programs which have been gaining in popularity within teacher education in Australia, attended by preservice and inservice teachers from China, Hong Kong, Korea, and other Asian countries. This research is conducted at a time when the imperative to globalise higher education provision is a strategic factor in the educational policies of both Australia and Hong Kong. In Australia, international educational services now constitute the country’s third largest export with more than 400,000 students coming to Australia to study annually. In order to maintain Australia’s current global position as the third most popular Englishspeaking study destination, the government is now focusing on sustainability and the quality of the study experience being offered to international students (Bradley Review, 2008). In Hong Kong, the government sponsors both preservice and inservice English as a second language (ESL) teachers to undertake SLIM programs in Australia and other English-speaking countries, as part of their policy of promoting high levels of English proficiency in Hong Kong classrooms. Transnational teacher education is an important issue to which this study contributes insights into the affordances and constraints of a school-based experience in the transnational context. Second language teacher education has been defined as interventions designed to develop participants’ professional knowledge. In this study, it is argued that participation in a different community of practice helps to foreground tacit theories of second language pedagogy, making them visible and open to review. Questions of pedagogy are also seen as questions of teacher identity, constituting the way that one is in the classroom. I take up a sociocultural and poststructural framework, drawing on the work of James Gee and Mikhail Bakhtin, to theorise the construction of teacher identity as emerging through dialogic relations and socially situated discursive practices. From this perspective, this study investigates whether these teachers engage with different ways of representing themselves through appropriating, adapting or rejecting Discourses prevailing in the Australian classroom. Research suggests that reflecting on dilemmas encountered as lived experiences can extend professional understandings. In this study, the participants engage in a process of dialogic reflection on their intercultural classroom interactions, examining with their peers and their lecturer/researcher selected moments of dissonance that they have faced in the unfamiliar context of an Australian primary classroom. It is argued that the recursive and multivoiced nature of this process of reflection on practice allows participants opportunities to negotiate new understandings of second language teacher identity. Dialogic learning, based on the theories of Bakhtin and Vygotsky, provides the theoretic framing not only for the process of reflection instantiated in this study, but also features in the analysis of the participants’ second language classroom practices. The research design uses a combined discourse analytic and ethnographic approach as a logic-of-inquiry to explore the dialogic relationships which these second language teachers negotiate with their students and their peers in the transnational context. In this way, through discourse analysis of their classroom talk and reflective dialogues, assisted by the analytic tools of speech genres and discourse formats, I explore the participants’ ways of doing and being second language teachers. Thus, this analysis traces the process of ideological becoming of these beginner teachers as shifts in their understandings of teacher and student identities. This study also demonstrates the potential for a nontraditional stimulated recall interview to provide dialogic scaffolding for beginner teachers to reflect productively on their practice.