22 resultados para Wakabayashi, Mel
em Queensland University of Technology - ePrints Archive
Resumo:
The effectiveness of higher-order spectral (HOS) phase features in speaker recognition is investigated by comparison with Mel Cepstral features on the same speech data. HOS phase features retain phase information from the Fourier spectrum unlikeMel–frequency Cepstral coefficients (MFCC). Gaussian mixture models are constructed from Mel– Cepstral features and HOS features, respectively, for the same data from various speakers in the Switchboard telephone Speech Corpus. Feature clusters, model parameters and classification performance are analyzed. HOS phase features on their own provide a correct identification rate of about 97% on the chosen subset of the corpus. This is the same level of accuracy as provided by MFCCs. Cluster plots and model parameters are compared to show that HOS phase features can provide complementary information to better discriminate between speakers.
Resumo:
Speech recognition in car environments has been identified as a valuable means for reducing driver distraction when operating non-critical in-car systems. Likelihood-maximising (LIMA) frameworks optimise speech enhancement algorithms based on recognised state sequences rather than traditional signal-level criteria such as maximising signal-to-noise ratio. Previously presented LIMA frameworks require calibration utterances to generate optimised enhancement parameters which are used for all subsequent utterances. Sub-optimal recognition performance occurs in noise conditions which are significantly different from that present during the calibration session - a serious problem in rapidly changing noise environments. We propose a dialog-based design which allows regular optimisation iterations in order to track the changing noise conditions. Experiments using Mel-filterbank spectral subtraction are performed to determine the optimisation requirements for vehicular environments and show that minimal optimisation assists real-time operation with improved speech recognition accuracy. It is also shown that the proposed design is able to provide improved recognition performance over frameworks incorporating a calibration session.
Resumo:
Secondary tasks such as cell phone calls or interaction with automated speech dialog systems (SDSs) increase the driver’s cognitive load as well as the probability of driving errors. This study analyzes speech production variations due to cognitive load and emotional state of drivers in real driving conditions. Speech samples were acquired from 24 female and 17 male subjects (approximately 8.5 h of data) while talking to a co-driver and communicating with two automated call centers, with emotional states (neutral, negative) and the number of necessary SDS query repetitions also labeled. A consistent shift in a number of speech production parameters (pitch, first format center frequency, spectral center of gravity, spectral energy spread, and duration of voiced segments) was observed when comparing SDS interaction against co-driver interaction; further increases were observed when considering negative emotion segments and the number of requested SDS query repetitions. A mel frequency cepstral coefficient based Gaussian mixture classifier trained on 10 male and 10 female sessions provided 91% accuracy in the open test set task of distinguishing co-driver interactions from SDS interactions, suggesting—together with the acoustic analysis—that it is possible to monitor the level of driver distraction directly from their speech.
Resumo:
Automatic Speech Recognition (ASR) has matured into a technology which is becoming more common in our everyday lives, and is emerging as a necessity to minimise driver distraction when operating in-car systems such as navigation and infotainment. In “noise-free” environments, word recognition performance of these systems has been shown to approach 100%, however this performance degrades rapidly as the level of background noise is increased. Speech enhancement is a popular method for making ASR systems more ro- bust. Single-channel spectral subtraction was originally designed to improve hu- man speech intelligibility and many attempts have been made to optimise this algorithm in terms of signal-based metrics such as maximised Signal-to-Noise Ratio (SNR) or minimised speech distortion. Such metrics are used to assess en- hancement performance for intelligibility not speech recognition, therefore mak- ing them sub-optimal ASR applications. This research investigates two methods for closely coupling subtractive-type enhancement algorithms with ASR: (a) a computationally-efficient Mel-filterbank noise subtraction technique based on likelihood-maximisation (LIMA), and (b) in- troducing phase spectrum information to enable spectral subtraction in the com- plex frequency domain. Likelihood-maximisation uses gradient-descent to optimise parameters of the enhancement algorithm to best fit the acoustic speech model given a word se- quence known a priori. Whilst this technique is shown to improve the ASR word accuracy performance, it is also identified to be particularly sensitive to non-noise mismatches between the training and testing data. Phase information has long been ignored in spectral subtraction as it is deemed to have little effect on human intelligibility. In this work it is shown that phase information is important in obtaining highly accurate estimates of clean speech magnitudes which are typically used in ASR feature extraction. Phase Estimation via Delay Projection is proposed based on the stationarity of sinusoidal signals, and demonstrates the potential to produce improvements in ASR word accuracy in a wide range of SNR. Throughout the dissertation, consideration is given to practical implemen- tation in vehicular environments which resulted in two novel contributions – a LIMA framework which takes advantage of the grounding procedure common to speech dialogue systems, and a resource-saving formulation of frequency-domain spectral subtraction for realisation in field-programmable gate array hardware. The techniques proposed in this dissertation were evaluated using the Aus- tralian English In-Car Speech Corpus which was collected as part of this work. This database is the first of its kind within Australia and captures real in-car speech of 50 native Australian speakers in seven driving conditions common to Australian environments.
Resumo:
Traditional speech enhancement methods optimise signal-level criteria such as signal-to-noise ratio, but these approaches are sub-optimal for noise-robust speech recognition. Likelihood-maximising (LIMA) frameworks are an alternative that optimise parameters of enhancement algorithms based on state sequences generated for utterances with known transcriptions. Previous reports of LIMA frameworks have shown significant promise for improving speech recognition accuracies under additive background noise for a range of speech enhancement techniques. In this paper we discuss the drawbacks of the LIMA approach when multiple layers of acoustic mismatch are present – namely background noise and speaker accent. Experimentation using LIMA-based Mel-filterbank noise subtraction on American and Australian English in-car speech databases supports this discussion, demonstrating that inferior speech recognition performance occurs when a second layer of mismatch is seen during evaluation.
Resumo:
For several reasons, the Fourier phase domain is less favored than the magnitude domain in signal processing and modeling of speech. To correctly analyze the phase, several factors must be considered and compensated, including the effect of the step size, windowing function and other processing parameters. Building on a review of these factors, this paper investigates a spectral representation based on the Instantaneous Frequency Deviation, but in which the step size between processing frames is used in calculating phase changes, rather than the traditional single sample interval. Reflecting these longer intervals, the term delta-phase spectrum is used to distinguish this from instantaneous derivatives. Experiments show that mel-frequency cepstral coefficients features derived from the delta-phase spectrum (termed Mel-Frequency delta-phase features) can produce broadly similar performance to equivalent magnitude domain features for both voice activity detection and speaker recognition tasks. Further, it is shown that the fusion of the magnitude and phase representations yields performance benefits over either in isolation.
Resumo:
This paper investigates the use of mel-frequency deltaphase (MFDP) features in comparison to, and in fusion with, traditional mel-frequency cepstral coefficient (MFCC) features within joint factor analysis (JFA) speaker verification. MFCC features, commonly used in speaker recognition systems, are derived purely from the magnitude spectrum, with the phase spectrum completely discarded. In this paper, we investigate if features derived from the phase spectrum can provide additional speaker discriminant information to the traditional MFCC approach in a JFA based speaker verification system. Results are presented which provide a comparison of MFCC-only, MFDPonly and score fusion of the two approaches within a JFA speaker verification approach. Based upon the results presented using the NIST 2008 Speaker Recognition Evaluation (SRE) dataset, we believe that, while MFDP features alone cannot compete with MFCC features, MFDP can provide complementary information that result in improved speaker verification performance when both approaches are combined in score fusion, particularly in the case of shorter utterances.
Resumo:
This chapter outlines the reasons why discourse analysis is an important dimension of critical social work practice. It brings to the forefront the very significant new contributions that sociologists focusing on the politics of recognition and redistribution, such as Nancy Fraser and Axel Honneth, can make in casting a "new politics of critical social work". In making this case, it begins by discussing some key developments in discourse theory and analysis within the social sciences and how they relate to the normative concerns of social work, specifically social justice and its multiple interpretations. Developing an appropriate analytical framework for social work practice can be difficult because there are conflicting and overlapping definitions of discourse formulated from various theoretical and disciplinary standpoints (Fairclough, 1992; Macdonnell, 1991). There are many different accounts of discourse that have developed in the social sciences, which is partly a result of recent interest in discourse theory among a wide range of academic disciplines. Whether language has assumed more of a central focus as a result of increased academic interest, or whether there has been an increase in the social importance of language in the operations of power is open to question...
Resumo:
We present a framework and first set of simulations for evolving a language for communicating about space. The framework comprises two components: (1) An established mobile robot platform, RatSLAM, which has a "brain" architecture based on rodent hippocampus with the ability to integrate visual and odometric cues to create internal maps of its environment. (2) A language learning system based on a neural network architecture that has been designed and implemented with the ability to evolve generalizable languages which can be learned by naive learners. A study using visual scenes and internal maps streamed from the simulated world of the robots to evolve languages is presented. This study investigated the structure of the evolved languages showing that with these inputs, expressive languages can effectively categorize the world. Ongoing studies are extending these investigations to evolve languages that use the full power of the robots representations in populations of agents.
Resumo:
Speech recognition in car environments has been identified as a valuable means for reducing driver distraction when operating noncritical in-car systems. Under such conditions, however, speech recognition accuracy degrades significantly, and techniques such as speech enhancement are required to improve these accuracies. Likelihood-maximizing (LIMA) frameworks optimize speech enhancement algorithms based on recognized state sequences rather than traditional signal-level criteria such as maximizing signal-to-noise ratio. LIMA frameworks typically require calibration utterances to generate optimized enhancement parameters that are used for all subsequent utterances. Under such a scheme, suboptimal recognition performance occurs in noise conditions that are significantly different from that present during the calibration session – a serious problem in rapidly changing noise environments out on the open road. In this chapter, we propose a dialog-based design that allows regular optimization iterations in order to track the ever-changing noise conditions. Experiments using Mel-filterbank noise subtraction (MFNS) are performed to determine the optimization requirements for vehicular environments and show that minimal optimization is required to improve speech recognition, avoid over-optimization, and ultimately assist with semireal-time operation. It is also shown that the proposed design is able to provide improved recognition performance over frameworks incorporating a calibration session only.
Resumo:
As a large, isolated and relatively ancient landmass, New Zealand occupies a unique place in the biological world, with distinctive terrestrial biota and a high proportion of primitive endemic forms. Biology Aotearoa covers the origins, evolution and conservation of the New Zealand flora, fauna and fungi. Each chapter is written by specialists in the field, often working from different perspectives to build up a comprehensive picture. Topics include: the geological history of our land origins, and evolution of our plants, animals and fungi current status of rare and threatened species past, present and future management of native species the effect of human immigration on the native biota. Colour diagrams and photographs are used throughout the text. This book is suitable for all students of biology or ecology who wish to know about the unique nature of Aotearoa New Zealand and its context in the biological world.
Resumo:
This important new book draws lessons from a large-scale initiative to bring about the improvement of an urban education system. Written from an insider perspective by an internationally recognized researcher, it presents a new way of thinking about system change. This builds on the idea that there are untapped resources within schools and the communities they serve that can be mobilized in order to transform schools from places that do well for some children so that they can do well for many more. Towards Self-improving School Systems presents a strategic framework that can help to foster new, more fruitful working relationships: between national and local government; within and between schools; and between schools and their local communities. What is distinctive in the approach is that this is mainly led from within schools, with senior staff having a central role as system leaders. The book will be relevant to a wide range of readers throughout the world who are concerned with the strengthening of their national educational systems, including teachers, school leaders, policy makers and researchers. The argument it presents is particularly important for the growing number of countries where increased emphasis on school autonomy, competition and choice is leading to fragmentation within education provision.
Resumo:
The Older Australian Twins Study (OATS) was recently initiated to investigate genetic and environmental factors and their associations and interactions in healthy brain ageing and ageing-related neurocognitive disorders. The study extends the classic MZ-DZ design to include one or two equivalently aged siblings for each twin pair and utilizes the rich resources of the Australian Twin Registry. The study has a number of distinguishing features including comprehensive psychiatric, neuropsychological, cardiovascular, metabolic, and neuroimaging assessments, a longitudinal design and links with a brain donor program. The study measures many behavioral and environmental factors, but in particular lifetime physical and mental activity, physical and psychological trauma, loss of parent early in life, later losses and life events, early-life socioeconomic environment, alcohol and drug use, occupational exposure, and nutrition. It also includes comprehensive cardiovascular assessment, blood biochemistry, genetics and proteomics. The socio-demographic and health data on the first 172 pairs of twins participating in this study are presented. Prevalence of mild cognitive impairment is 12.8% and of dementia 1.5% in the sample. The target sample size is 1000, with at least 400 pairs of twins aged 65-90 years. The cohort will be assessed every two years, with in-depth assessments being repeated. OATS offers an excellent opportunity for collaboration with other similar studies as well as researchers who share the same interests.