997 resultados para stage speech
Resumo:
Non-driving related cognitive load and variations of emotional state may impact a driver’s capability to control a vehicle and introduces driving errors. Availability of reliable cognitive load and emotion detection in drivers would benefit the design of active safety systems and other intelligent in-vehicle interfaces. In this study, speech produced by 68 subjects while driving in urban areas is analyzed. A particular focus is on speech production differences in two secondary cognitive tasks, interactions with a co-driver and calls to automated spoken dialog systems (SDS), and two emotional states during the SDS interactions - neutral/negative. A number of speech parameters are found to vary across the cognitive/emotion classes. Suitability of selected cepstral- and production-based features for automatic cognitive task/emotion classification is investigated. A fusion of GMM/SVM classifiers yields an accuracy of 94.3% in cognitive task and 81.3% in emotion classification.
Resumo:
Acoustically, car cabins are extremely noisy and as a consequence, existing audio-only speech recognition systems, for voice-based control of vehicle functions such as the GPS based navigator, perform poorly. Audio-only speech recognition systems fail to make use of the visual modality of speech (eg: lip movements). As the visual modality is immune to acoustic noise, utilising this visual information in conjunction with an audio only speech recognition system has the potential to improve the accuracy of the system. The field of recognising speech using both auditory and visual inputs is known as Audio Visual Speech Recognition (AVSR). Continuous research in AVASR field has been ongoing for the past twenty-five years with notable progress being made. However, the practical deployment of AVASR systems for use in a variety of real-world applications has not yet emerged. The main reason is due to most research to date neglecting to address variabilities in the visual domain such as illumination and viewpoint in the design of the visual front-end of the AVSR system. In this paper we present an AVASR system in a real-world car environment using the AVICAR database [1], which is publicly available in-car database and we show that the use of visual speech conjunction with the audio modality is a better approach to improve the robustness and effectiveness of voice-only recognition systems in car cabin environments.
Resumo:
The detection of voice activity is a challenging problem, especially when the level of acoustic noise is high. Most current approaches only utilise the audio signal, making them susceptible to acoustic noise. An obvious approach to overcome this is to use the visual modality. The current state-of-the-art visual feature extraction technique is one that uses a cascade of visual features (i.e. 2D-DCT, feature mean normalisation, interstep LDA). In this paper, we investigate the effectiveness of this technique for the task of visual voice activity detection (VAD), and analyse each stage of the cascade and quantify the relative improvement in performance gained by each successive stage. The experiments were conducted on the CUAVE database and our results highlight that the dynamics of the visual modality can be used to good effect to improve visual voice activity detection performance.
Resumo:
Background: The quality of stormwater runoff from ports is significant as it can be an important source of pollution to the marine environment. This is also a significant issue for the Port of Brisbane as it is located in an area of high environmental values. Therefore, it is imperative to develop an in-depth understanding of stormwater runoff quality to ensure that appropriate strategies are in place for quality improvement. ---------------- The Port currently has a network of stormwater sample collection points where event based samples together with grab samples are tested for a range of water quality parameters. Whilst this information provides a ‘snapshot’ of the pollutants being washed from the catchment/s, it does not allow for a quantifiable assessment of total contaminant loads being discharged to the waters of Moreton Bay. It also does not represent pollutant build-up and wash-off from the different land uses across a broader range of rainfall events which might be expected. As such, it is difficult to relate stormwater quality to different pollutant sources within the Port environment. ----------------- Consequently, this would make the source tracking of pollutants to receiving waters extremely difficult and in turn the ability to implement appropriate mitigation measures. Also, without this detailed understanding, the efficacy of the various stormwater quality mitigation measures implemented cannot be determined with certainty. --------------- Current knowledge on port stormwater runoff quality Currently, little knowledge exists with regards to the pollutant generation capacity specific to port land uses as these do not necessarily compare well with conventional urban industrial or commercial land use due to the specific nature of port activities such as inter-modal operations and cargo management. Furthermore, traffic characteristics in a port area are different to a conventional urban area. Consequently, as data inputs based on an industrial and commercial land uses for modelling purposes is questionable. ------------------ A comprehensive review of published research failed to locate any investigations undertaken with regards to pollutant build-up and wash-off for port specific land uses. Furthermore, there is very limited information made available by various ports worldwide about the pollution generation potential of their facilities. Published work in this area has essentially focussed on the water quality or environmental values in the receiving waters such as the downstream bay or estuary. ----------------- The Project: The research project is an outcome of the collaborative Partnership between the Port of Brisbane Corporation (POBC) and Queensland University of Technology (QUT). A key feature of this Partnership is the undertaking of ‘cutting edge’ research to strengthen the environmental custodianship of the Port area. This project aims to develop a port specific stormwater quality model to allow informed decision making in relation to stormwater quality improvement in the context of the increased growth of the Port. --------------- Stage 1 of the research project focussed on the assessment of pollutant build-up and wash-off using rainfall simulation from the current Port of Brisbane facilities with the longer-term objective of contributing to the development of ecological risk mitigation strategies for future expansion scenarios. Investigation of complex processes such as pollutant wash-off using naturally occurring rainfall events has inherent difficulties. These can be overcome using simulated rainfall for the investigations. ----------------- The deliverables for Stage 1 included the following: * Pollutant build-up and wash-off profiles for six primary land uses within the Port of Brisbane to be used for water quality model development. * Recommendations with regards to future stormwater quality monitoring and pollution mitigation measures. The outcomes are expected to deliver the following benefits to the Port of Brisbane: * The availability of Port specific pollutant build-up and wash-off data will enable the implementation of customised stormwater pollution mitigation strategies. * The water quality data collected would form the baseline data for a Port specific water quality model for mitigation and predictive purposes. * To be at the cutting-edge in terms of water quality management and environmental best practice in the context of port infrastructure. ---------------- Conclusions: The important conclusions from the study are: * It confirmed that the Port environment is unique in terms of pollutant characteristics and is not comparable to typical urban land uses. * For most pollutant types, the Port land uses exhibited lower pollutant concentrations when compared to typical urban land uses. * The pollutant characteristics varied across the different land uses and were not consistent in terms of the land use. Hence, the implementation of stereotypical structural water quality improvement devices could be of limited value. * The <150m particle size range was predominant in suspended solids for pollutant build-up as well as wash-off. Therefore, if suspended solids are targeted as the surrogate parameter for water quality improvement, this specific particle size range needs to be removed. ------------------- Recommendations: Based on the study results the following preliminary recommendations are made: * Due to the appreciable variation in pollutant characteristics for different port land uses, any water quality monitoring stations should preferably be located such that source areas can be easily identified. * The study results having identified significant pollutants for the different land uses should enable the development of a more customised water quality monitoring and testing regime targeting the critical pollutants. * A ‘one size fits all’ approach may not be appropriate for the different port land uses due to the varying pollutant characteristics. As such, pollution mitigation will need to be specifically tailored to suit the specific land use. * Any structural measures implemented for pollution mitigation to be effective should have the capability to remove suspended solids of size <150m. * Based on the results presented and the particularly the fact that the Port land uses cannot be compared to conventional urban land uses in relation to pollutant generation, consideration should be given to the development of a port specific water quality model.
Resumo:
Background: The quality of stormwater runoff from ports is significant as it can be an important source of pollution to the marine environment. This is also a significant issue for the Port of Brisbane as it is located in an area of high environmental values. Therefore, it is imperative to develop an in-depth understanding of stormwater runoff quality to ensure that appropriate strategies are in place for quality improvement, where necessary. To this end, the Port of Brisbane Corporation aimed to develop a port specific stormwater model for the Fisherman Islands facility. The need has to be considered in the context of the proposed future developments of the Port area. ----------------- The Project: The research project is an outcome of the collaborative Partnership between the Port of Brisbane Corporation (POBC) and Queensland University of Technology (QUT). A key feature of this Partnership is that it seeks to undertake research to assist the Port in strengthening the environmental custodianship of the Port area through ‘cutting edge’ research and its translation into practical application. ------------------ The project was separated into two stages. The first stage developed a quantitative understanding of the generation potential of pollutant loads in the existing land uses. This knowledge was then used as input for the stormwater quality model developed in the subsequent stage. The aim is to expand this model across the yet to be developed port expansion area. This is in order to predict pollutant loads associated with stormwater flows from this area with the longer term objective of contributing to the development of ecological risk mitigation strategies for future expansion scenarios. ----------------- Study approach: Stage 1 of the overall study confirmed that Port land uses are unique in terms of the anthropogenic activities occurring on them. This uniqueness in land use results in distinctive stormwater quality characteristics different to other conventional urban land uses. Therefore, it was not scientifically valid to consider the Port as belonging to a single land use category or to consider as being similar to any typical urban land use. The approach adopted in this study was very different to conventional modelling studies where modelling parameters are developed using calibration. The field investigations undertaken in Stage 1 of the overall study helped to create fundamental knowledge on pollutant build-up and wash-off in different Port land uses. This knowledge was then used in computer modelling so that the specific characteristics of pollutant build-up and wash-off can be replicated. This meant that no calibration processes were involved due to the use of measured parameters for build-up and wash-off. ---------------- Conclusions: Stage 2 of the study was primarily undertaken using the SWMM stormwater quality model. It is a physically based model which replicates natural processes as closely as possible. The time step used and catchment variability considered was adequate to accommodate the temporal and spatial variability of input parameters and the parameters used in the modelling reflect the true nature of rainfall-runoff and pollutant processes to the best of currently available knowledge. In this study, the initial loss values adopted for the impervious surfaces are relatively high compared to values noted in research literature. However, given the scientifically valid approach used for the field investigations, it is appropriate to adopt the initial losses derived from this study for future modelling of Port land uses. The relatively high initial losses will reduce the runoff volume generated as well as the frequency of runoff events significantly. Apart from initial losses, most of the other parameters used in SWMM modelling are generic to most modelling studies. Development of parameters for MUSIC model source nodes was one of the primary objectives of this study. MUSIC, uses the mean and standard deviation of pollutant parameters based on a normal distribution. However, based on the values generated in this study, the variation of Event Mean Concentrations (EMCs) for Port land uses within the given investigation period does not fit a normal distribution. This is possibly due to the fact that only one specific location was considered, namely the Port of Brisbane unlike in the case of the MUSIC model where a range of areas with different geographic and climatic conditions were investigated. Consequently, the assumptions used in MUSIC are not totally applicable for the analysis of water quality in Port land uses. Therefore, in using the parameters included in this report for MUSIC modelling, it is important to note that it may result in under or over estimations of annual pollutant loads. It is recommended that the annual pollutant load values given in the report should be used as a guide to assess the accuracy of the modelling outcomes. A step by step guide for using the knowledge generated from this study for MUSIC modelling is given in Table 4.6. ------------------ Recommendations: The following recommendations are provided to further strengthen the cutting edge nature of the work undertaken: * It is important to further validate the approach recommended for stormwater quality modelling at the Port. Validation will require data collection in relation to rainfall, runoff and water quality from the selected Port land uses. Additionally, the recommended modelling approach could be applied to a soon-to-be-developed area to assess ‘before’ and ‘after’ scenarios. * In the modelling study, TSS was adopted as the surrogate parameter for other pollutants. This approach was based on other urban water quality research undertaken at QUT. The validity of this approach should be further assessed for Port land uses. * The adoption of TSS as a surrogate parameter for other pollutants and the confirmation that the <150 m particle size range was predominant in suspended solids for pollutant wash-off gives rise to a number of important considerations. The ability of the existing structural stormwater mitigation measures to remove the <150 m particle size range need to be assessed. The feasibility of introducing source control measures as opposed to end-of-pipe measures for stormwater quality improvement may also need to be considered.
Resumo:
What really changed for Australian Aboriginal and Torres Strait Islander people between Paul Keating’s Redfern Park Speech (Keating 1992) and Kevin Rudd’s Apology to the stolen generations (Rudd 2008)? What will change between the Apology and the next speech of an Australian Prime Minister? The two speeches were intricately linked, and they were both personal and political. But do they really signify change at the political level? This paper reflects my attempt to turn the gaze away from Aboriginal and Torres Strait Islander people, and back to where the speeches originated: the Australian Labor Party (ALP). I question whether the changes foreshadowed in the two speeches – including changes by the Australian public and within Australian society – are evident in the internal mechanisms of the ALP. I also seek to understand why non-Indigenous women seem to have given in to the existing ways of the ALP instead of challenging the status quo which keeps Aboriginal and Torres Strait Islander peoples marginalised. I believe that, without a thorough examination and a change in the ALP’s practices, the domination and subjugation of Indigenous peoples will continue – within the Party, through the Australian political process and, therefore, through governments.
Resumo:
While close talking microphones give the best signal quality and produce the highest accuracy from current Automatic Speech Recognition (ASR) systems, the speech signal enhanced by microphone array has been shown to be an effective alternative in a noisy environment. The use of microphone arrays in contrast to close talking microphones alleviates the feeling of discomfort and distraction to the user. For this reason, microphone arrays are popular and have been used in a wide range of applications such as teleconferencing, hearing aids, speaker tracking, and as the front-end to speech recognition systems. With advances in sensor and sensor network technology, there is considerable potential for applications that employ ad-hoc networks of microphone-equipped devices collaboratively as a virtual microphone array. By allowing such devices to be distributed throughout the users’ environment, the microphone positions are no longer constrained to traditional fixed geometrical arrangements. This flexibility in the means of data acquisition allows different audio scenes to be captured to give a complete picture of the working environment. In such ad-hoc deployment of microphone sensors, however, the lack of information about the location of devices and active speakers poses technical challenges for array signal processing algorithms which must be addressed to allow deployment in real-world applications. While not an ad-hoc sensor network, conditions approaching this have in effect been imposed in recent National Institute of Standards and Technology (NIST) ASR evaluations on distant microphone recordings of meetings. The NIST evaluation data comes from multiple sites, each with different and often loosely specified distant microphone configurations. This research investigates how microphone array methods can be applied for ad-hoc microphone arrays. A particular focus is on devising methods that are robust to unknown microphone placements in order to improve the overall speech quality and recognition performance provided by the beamforming algorithms. In ad-hoc situations, microphone positions and likely source locations are not known and beamforming must be achieved blindly. There are two general approaches that can be employed to blindly estimate the steering vector for beamforming. The first is direct estimation without regard to the microphone and source locations. An alternative approach is instead to first determine the unknown microphone positions through array calibration methods and then to use the traditional geometrical formulation for the steering vector. Following these two major approaches investigated in this thesis, a novel clustered approach which includes clustering the microphones and selecting the clusters based on their proximity to the speaker is proposed. Novel experiments are conducted to demonstrate that the proposed method to automatically select clusters of microphones (ie, a subarray), closely located both to each other and to the desired speech source, may in fact provide a more robust speech enhancement and recognition than the full array could.
Resumo:
Traditional speech enhancement methods optimise signal-level criteria such as signal-to-noise ratio, but these approaches are sub-optimal for noise-robust speech recognition. Likelihood-maximising (LIMA) frameworks are an alternative that optimise parameters of enhancement algorithms based on state sequences generated for utterances with known transcriptions. Previous reports of LIMA frameworks have shown significant promise for improving speech recognition accuracies under additive background noise for a range of speech enhancement techniques. In this paper we discuss the drawbacks of the LIMA approach when multiple layers of acoustic mismatch are present – namely background noise and speaker accent. Experimentation using LIMA-based Mel-filterbank noise subtraction on American and Australian English in-car speech databases supports this discussion, demonstrating that inferior speech recognition performance occurs when a second layer of mismatch is seen during evaluation.
Resumo:
Traditional speech enhancement methods optimise signal-level criteria such as signal-to-noise ratio, but such approaches are sub-optimal for noise-robust speech recognition. Likelihood-maximising (LIMA) frameworks on the other hand, optimise the parameters of speech enhancement algorithms based on state sequences generated by a speech recogniser for utterances of known transcriptions. Previous applications of LIMA frameworks have generated a set of global enhancement parameters for all model states without taking in account the distribution of model occurrence, making optimisation susceptible to favouring frequently occurring models, in particular silence. In this paper, we demonstrate the existence of highly disproportionate phonetic distributions on two corpora with distinct speech tasks, and propose to normalise the influence of each phone based on a priori occurrence probabilities. Likelihood analysis and speech recognition experiments verify this approach for improving ASR performance in noisy environments.
Resumo:
In recent times, the improved levels of accuracy obtained by Automatic Speech Recognition (ASR) technology has made it viable for use in a number of commercial products. Unfortunately, these types of applications are limited to only a few of the world’s languages, primarily because ASR development is reliant on the availability of large amounts of language specific resources. This motivates the need for techniques which reduce this language-specific, resource dependency. Ideally, these approaches should generalise across languages, thereby providing scope for rapid creation of ASR capabilities for resource poor languages. Cross Lingual ASR emerges as a means for addressing this need. Underpinning this approach is the observation that sound production is largely influenced by the physiological construction of the vocal tract, and accordingly, is human, and not language specific. As a result, a common inventory of sounds exists across languages; a property which is exploitable, as sounds from a resource poor, target language can be recognised using models trained on resource rich, source languages. One of the initial impediments to the commercial uptake of ASR technology was its fragility in more challenging environments, such as conversational telephone speech. Subsequent improvements in these environments has gained consumer confidence. Pragmatically, if cross lingual techniques are to considered a viable alternative when resources are limited, they need to perform under the same types of conditions. Accordingly, this thesis evaluates cross lingual techniques using two speech environments; clean read speech and conversational telephone speech. Languages used in evaluations are German, Mandarin, Japanese and Spanish. Results highlight that previously proposed approaches provide respectable results for simpler environments such as read speech, but degrade significantly when in the more taxing conversational environment. Two separate approaches for addressing this degradation are proposed. The first is based on deriving better target language lexical representation, in terms of the source language model set. The second, and ultimately more successful approach, focuses on improving the classification accuracy of context-dependent (CD) models, by catering for the adverse influence of languages specific phonotactic properties. Whilst the primary research goal in this thesis is directed towards improving cross lingual techniques, the catalyst for investigating its use was based on expressed interest from several organisations for an Indonesian ASR capability. In Indonesia alone, there are over 200 million speakers of some Malay variant, provides further impetus and commercial justification for speech related research on this language. Unfortunately, at the beginning of the candidature, limited research had been conducted on the Indonesian language in the field of speech science, and virtually no resources existed. This thesis details the investigative and development work dedicated towards obtaining an ASR system with a 10000 word recognition vocabulary for the Indonesian language.
Resumo:
This thesis investigates aspects of encoding the speech spectrum at low bit rates, with extensions to the effect of such coding on automatic speaker identification. Vector quantization (VQ) is a technique for jointly quantizing a block of samples at once, in order to reduce the bit rate of a coding system. The major drawback in using VQ is the complexity of the encoder. Recent research has indicated the potential applicability of the VQ method to speech when product code vector quantization (PCVQ) techniques are utilized. The focus of this research is the efficient representation, calculation and utilization of the speech model as stored in the PCVQ codebook. In this thesis, several VQ approaches are evaluated, and the efficacy of two training algorithms is compared experimentally. It is then shown that these productcode vector quantization algorithms may be augmented with lossless compression algorithms, thus yielding an improved overall compression rate. An approach using a statistical model for the vector codebook indices for subsequent lossless compression is introduced. This coupling of lossy compression and lossless compression enables further compression gain. It is demonstrated that this approach is able to reduce the bit rate requirement from the current 24 bits per 20 millisecond frame to below 20, using a standard spectral distortion metric for comparison. Several fast-search VQ methods for use in speech spectrum coding have been evaluated. The usefulness of fast-search algorithms is highly dependent upon the source characteristics and, although previous research has been undertaken for coding of images using VQ codebooks trained with the source samples directly, the product-code structured codebooks for speech spectrum quantization place new constraints on the search methodology. The second major focus of the research is an investigation of the effect of lowrate spectral compression methods on the task of automatic speaker identification. The motivation for this aspect of the research arose from a need to simultaneously preserve the speech quality and intelligibility and to provide for machine-based automatic speaker recognition using the compressed speech. This is important because there are several emerging applications of speaker identification where compressed speech is involved. Examples include mobile communications where the speech has been highly compressed, or where a database of speech material has been assembled and stored in compressed form. Although these two application areas have the same objective - that of maximizing the identification rate - the starting points are quite different. On the one hand, the speech material used for training the identification algorithm may or may not be available in compressed form. On the other hand, the new test material on which identification is to be based may only be available in compressed form. Using the spectral parameters which have been stored in compressed form, two main classes of speaker identification algorithm are examined. Some studies have been conducted in the past on bandwidth-limited speaker identification, but the use of short-term spectral compression deserves separate investigation. Combining the major aspects of the research, some important design guidelines for the construction of an identification model when based on the use of compressed speech are put forward.