295 resultados para Opportunity Recognition
Resumo:
In an automotive environment, the performance of a speech recognition system is affected by environmental noise if the speech signal is acquired directly from a microphone. Speech enhancement techniques are therefore necessary to improve the speech recognition performance. In this paper, a field-programmable gate array (FPGA) implementation of dual-microphone delay-and-sum beamforming (DASB) for speech enhancement is presented. As the first step towards a cost-effective solution, the implementation described in this paper uses a relatively high-end FPGA device to facilitate the verification of various design strategies and parameters. Experimental results show that the proposed design can produce output waveforms close to those generated by a theoretical (floating-point) model with modest usage of FPGA resources. Speech recognition experiments are also conducted on enhanced in-car speech waveforms produced by the FPGA in order to compare recognition performance with the floating-point representation running on a PC.
Resumo:
This article assesses the 'Managing Diversity' (MD) approach in Australia, examining its drivers, discussing its relationship to legislation designed to promote equity, and examining it as a set of management practices. It has been plausibly argued, on efficiency grounds, that responsibility for achieving equality objectives must be shifted to organisations as this links contextual conditions to organisational processes. However, even where there is some prescription and guidance such as that provided by Australian Equal Employment Opportunity (EEO) legislation targeted specifically to women employees, both practice and outcomes are variable. This is even more the case with MD where there are no guiding principles or legislative support. The article examines the best practice EEO and MD programs of Australian organisations to demonstrate the approaches and programs that are being developed at the workplace and to highlight the limitations of the 'business case' approach underlying such programs.
Resumo:
The purpose of this chapter is to describe the use of caricatured contrasting scenarios (Bødker, 2000) and how they can be used to consider potential designs for disruptive technologies. The disruptive technology in this case is Automatic Speech Recognition (ASR) software in workplace settings. The particular workplace is the Magistrates Court of the Australian Capital Territory.----- Caricatured contrasting scenarios are ideally suited to exploring how ASR might be implemented in a particular setting because they allow potential implementations to be “sketched” quickly and with little effort. This sketching of potential interactions and the emphasis of both positive and negative outcomes allows the benefits and pitfalls of design decisions to become apparent.----- A brief description of the Court is given, describing the reasons for choosing the Court for this case study. The work of the Court is framed as taking place in two modes: Front of house, where the courtroom itself is, and backstage, where documents are processed and the business of the court is recorded and encoded into various systems.----- Caricatured contrasting scenarios describing the introduction of ASR to the front of house are presented and then analysed. These scenarios show that the introduction of ASR to the court would be highly problematic.----- The final section describes how ASR could be re-imagined in order to make it useful for the court. A final scenario is presented that describes how this re-imagined ASR could be integrated into both the front of house and backstage of the court in a way that could strengthen both processes.
Resumo:
Identifying an individual from surveillance video is a difficult, time consuming and labour intensive process. The proposed system aims to streamline this process by filtering out unwanted scenes and enhancing an individual's face through super-resolution. An automatic face recognition system is then used to identify the subject or present the human operator with likely matches from a database. A person tracker is used to speed up the subject detection and super-resolution process by tracking moving subjects and cropping a region of interest around the subject's face to reduce the number and size of the image frames to be super-resolved respectively. In this paper, experiments have been conducted to demonstrate how the optical flow super-resolution method used improves surveillance imagery for visual inspection as well as automatic face recognition on an Eigenface and Elastic Bunch Graph Matching system. The optical flow based method has also been benchmarked against the ``hallucination'' algorithm, interpolation methods and the original low-resolution images. Results show that both super-resolution algorithms improved recognition rates significantly. Although the hallucination method resulted in slightly higher recognition rates, the optical flow method produced less artifacts and more visually correct images suitable for human consumption.
Resumo:
Automatic Speech Recognition (ASR) has matured into a technology which is becoming more common in our everyday lives, and is emerging as a necessity to minimise driver distraction when operating in-car systems such as navigation and infotainment. In “noise-free” environments, word recognition performance of these systems has been shown to approach 100%, however this performance degrades rapidly as the level of background noise is increased. Speech enhancement is a popular method for making ASR systems more ro- bust. Single-channel spectral subtraction was originally designed to improve hu- man speech intelligibility and many attempts have been made to optimise this algorithm in terms of signal-based metrics such as maximised Signal-to-Noise Ratio (SNR) or minimised speech distortion. Such metrics are used to assess en- hancement performance for intelligibility not speech recognition, therefore mak- ing them sub-optimal ASR applications. This research investigates two methods for closely coupling subtractive-type enhancement algorithms with ASR: (a) a computationally-efficient Mel-filterbank noise subtraction technique based on likelihood-maximisation (LIMA), and (b) in- troducing phase spectrum information to enable spectral subtraction in the com- plex frequency domain. Likelihood-maximisation uses gradient-descent to optimise parameters of the enhancement algorithm to best fit the acoustic speech model given a word se- quence known a priori. Whilst this technique is shown to improve the ASR word accuracy performance, it is also identified to be particularly sensitive to non-noise mismatches between the training and testing data. Phase information has long been ignored in spectral subtraction as it is deemed to have little effect on human intelligibility. In this work it is shown that phase information is important in obtaining highly accurate estimates of clean speech magnitudes which are typically used in ASR feature extraction. Phase Estimation via Delay Projection is proposed based on the stationarity of sinusoidal signals, and demonstrates the potential to produce improvements in ASR word accuracy in a wide range of SNR. Throughout the dissertation, consideration is given to practical implemen- tation in vehicular environments which resulted in two novel contributions – a LIMA framework which takes advantage of the grounding procedure common to speech dialogue systems, and a resource-saving formulation of frequency-domain spectral subtraction for realisation in field-programmable gate array hardware. The techniques proposed in this dissertation were evaluated using the Aus- tralian English In-Car Speech Corpus which was collected as part of this work. This database is the first of its kind within Australia and captures real in-car speech of 50 native Australian speakers in seven driving conditions common to Australian environments.
Resumo:
This paper focuses on the varying approaches and methodologies adopted when the calculation of holding costs is undertaken, focusing on greenfield development. Whilst acknowledging there may be some consistency in embracing first principles relating to holding cost theory, a review of the literature reveals considerable lack of uniformity in this regard. There is even less clarity in quantitative determination, especially in Australia where there has been only limited empirical analysis undertaken. Despite a growing quantum of research undertaken in relation to various elements connected with housing affordability, the matter of holding costs has not been well addressed regardless of its part in the highly prioritised Australian Government’s housing research agenda. The end result has been a modicum of qualitative commentary relating to holding costs. There have been few attempts at finer-tuned analysis that exposes a quantified level of holding cost calculated with underlying rigour. Holding costs can take many forms, but they inevitably involve the computation of “carrying costs” of an initial outlay that has yet to fully realise its ultimate yield. Although sometimes considered a “hidden” cost, it is submitted that holding costs prospectively represent a major determinate of value. If this is the case, then considered in the context of housing affordability, it is therefore potentially pervasive.
Resumo:
Recovering position from sensor information is an important problem in mobile robotics, known as localisation. Localisation requires a map or some other description of the environment to provide the robot with a context to interpret sensor data. The mobile robot system under discussion is using an artificial neural representation of position. Building a geometrical map of the environment with a single camera and artificial neural networks is difficult. Instead it would be simpler to learn position as a function of the visual input. Usually when learning images, an intermediate representation is employed. An appropriate starting point for biologically plausible image representation is the complex cells of the visual cortex, which have invariance properties that appear useful for localisation. The effectiveness for localisation of two different complex cell models are evaluated. Finally the ability of a simple neural network with single shot learning to recognise these representations and localise a robot is examined.
Resumo:
The paper presents a fast and robust stereo object recognition method. The method is currently unable to identify the rotation of objects. This makes it very good at locating spheres which are rotationally independent. Approximate methods for located non-spherical objects have been developed. Fundamental to the method is that the correspondence problem is solved using information about the dimensions of the object being located. This is in contrast to previous stereo object recognition systems where the scene is first reconstructed by point matching techniques. The method is suitable for real-time application on low-power devices.
Resumo:
Acoustically, car cabins are extremely noisy and as a consequence audio-only, in-car voice recognition systems perform poorly. As the visual modality is immune to acoustic noise, using the visual lip information from the driver is seen as a viable strategy in circumventing this problem by using audio visual automatic speech recognition (AVASR). However, implementing AVASR requires a system being able to accurately locate and track the drivers face and lip area in real-time. In this paper we present such an approach using the Viola-Jones algorithm. Using the AVICAR [1] in-car database, we show that the Viola- Jones approach is a suitable method of locating and tracking the driver’s lips despite the visual variability of illumination and head pose for audio-visual speech recognition system.
Resumo:
Several approaches have been proposed to recognize handwritten Bengali characters using different curve fitting algorithms and curvature analysis. In this paper, a new algorithm (Curve-fitting Algorithm) to identify various strokes of a handwritten character is developed. The curve-fitting algorithm helps recognizing various strokes of different patterns (line, quadratic curve) precisely. This reduces the error elimination burden heavily. Implementation of this Modified Syntactic Method demonstrates significant improvement in the recognition of Bengali handwritten characters.
Resumo:
Acoustically, car cabins are extremely noisy and as a consequence, existing audio-only speech recognition systems, for voice-based control of vehicle functions such as the GPS based navigator, perform poorly. Audio-only speech recognition systems fail to make use of the visual modality of speech (eg: lip movements). As the visual modality is immune to acoustic noise, utilising this visual information in conjunction with an audio only speech recognition system has the potential to improve the accuracy of the system. The field of recognising speech using both auditory and visual inputs is known as Audio Visual Speech Recognition (AVSR). Continuous research in AVASR field has been ongoing for the past twenty-five years with notable progress being made. However, the practical deployment of AVASR systems for use in a variety of real-world applications has not yet emerged. The main reason is due to most research to date neglecting to address variabilities in the visual domain such as illumination and viewpoint in the design of the visual front-end of the AVSR system. In this paper we present an AVASR system in a real-world car environment using the AVICAR database [1], which is publicly available in-car database and we show that the use of visual speech conjunction with the audio modality is a better approach to improve the robustness and effectiveness of voice-only recognition systems in car cabin environments.
Resumo:
When classifying a signal, ideally we want our classifier to trigger a large response when it encounters a positive example and have little to no response for all other examples. Unfortunately in practice this does not occur with responses fluctuating, often causing false alarms. There exists a myriad of reasons why this is the case, most notably not incorporating the dynamics of the signal into the classification. In facial expression recognition, this has been highlighted as one major research question. In this paper we present a novel technique which incorporates the dynamics of the signal which can produce a strong response when the peak expression is found and essentially suppresses all other responses as much as possible. We conducted preliminary experiments on the extended Cohn-Kanade (CK+) database which shows its benefits. The ability to automatically and accurately recognize facial expressions of drivers is highly relevant to the automobile. For example, the early recognition of “surprise” could indicate that an accident is about to occur; and various safeguards could immediately be deployed to avoid or minimize injury and damage. In this paper, we conducted initial experiments on the extended Cohn-Kanade (CK+) database which shows its benefits.
Resumo:
In 2009, Religious Education is a designated key learning area in Catholic schools in the Archdiocese of Brisbane and, indeed, across Australia. Over the years, though, different conceptualisations of the nature and purpose of religious education have led to the construction of different approaches to the classroom teaching of religion. By investigating the development of religious education policy in the Archdiocese of Brisbane from 1984 to 2003, the study seeks to trace the emergence of new discourses on religious education. The study understands religious education to refer to a lifelong process that occurs through a variety of forms (Moran, 1989). In Catholic schools, it refers both to co-curricula activities, such as retreats and school liturgies, and the classroom teaching of religion. It is the policy framework for the classroom teaching of religion that this study explores. The research was undertaken using a policy case study approach to gain a detailed understanding of how new conceptualisations of religious education emerged at a particular site of policy production, in this case, the Archdiocese of Brisbane. The study draws upon Yeatman’s (1998) description of policy as occurring “when social actors think about what they are doing and why in relation to different and alternative possible futures” (p. 19) and views policy as consisting of more than texts themselves. Policy texts result from struggles over meaning (Taylor, 2004) in which specific discourses are mobilised to support particular views. The study has a particular interest in the analysis of Brisbane religious education policy texts, the discursive practices that surrounded them, and the contexts in which they arose. Policy texts are conceptualised in the study as representing “temporary settlements” (Gale, 1999). Such settlements are asymmetrical, temporary and dependent on context: asymmetrical in that dominant actors are favoured; temporary because dominant actors are always under challenge by other actors in the policy arena; and context - dependent because new situations require new settlements. To investigate the official policy documents, the study used Critical Discourse Analysis (hereafter referred to as CDA) as a research tool that affords the opportunity for researchers to map and chart the emergence of new discourses within the policy arena. As developed by Fairclough (2001), CDA is a three-dimensional application of critical analysis to language. In the Brisbane religious education arena, policy texts formed a genre chain (Fairclough, 2004; Taylor, 2004) which was a focus of the study. There are two features of texts that form genre chains: texts are systematically linked to one another; and, systematic relations of recontextualisation exist between the texts. Fairclough’s (2005) concepts of “imaginary space” and “frameworks for action” (p. 65) within the policy arena were applied to the Brisbane policy arena to investigate the relationship between policy statements and subsequent guidelines documents. Five key findings emerged from the study. First, application of CDA to policy documents revealed that a fundamental reconceptualisation of the nature and purpose of classroom religious education in Catholic schools occurred in the Brisbane policy arena over the last twenty-five years. Second, a disjuncture existed between catechetical discourses that continued to shape religious education policy statements, and educational discourses that increasingly shaped guidelines documents. Third, recontextualisation between policy documents was evident and dependent on the particular context in which religious education occurred. Fourth, at subsequent links in the chain, actors created their own “imaginary space”, thereby altering orders of discourse within the policy arena, with different actors being either foregrounded or marginalised. Fifth, intertextuality was more evident in the later links in the genre chain (i.e. 1994 policy statement and 1997 guidelines document) than in earlier documents. On the basis of the findings of the study, six recommendations are made. First, the institutional Church should carefully consider the contribution that the Catholic school can make to the overall pastoral mission of the diocese in twenty-first century Australia. Second, policymakers should articulate a nuanced understanding of the relationship between catechesis and education with regard to the religion classroom. Third, there should be greater awareness of the connections among policies relating to Catholic schools – especially the connection between enrolment policy and religious education policy. Fourth, there should be greater consistency between policy documents. Fifth, policy documents should be helpful for those to whom they are directed (i.e. Catholic schools, teachers). Sixth, “imaginary space” (Fairclough, 2005) in policy documents needs to be constructed in a way that allows for multiple “frameworks for action” (Fairclough, 2005) through recontextualisation. The findings of this study are significant in a number of ways. For religious educators, the study highlights the need to develop a shared understanding of the nature and purpose of classroom religious education. It argues that this understanding must take into account the multifaith nature of Australian society and the changing social composition of Catholic schools themselves. Greater recognition should be given to the contribution that religious studies courses such as Study of Religion make to the overall religious development of a person. In view of the social composition of Catholic schools, there is also an issue of ecclesiological significance concerning the conceptualisation of the relationship between the institutional Catholic Church and Catholic schools. Finally, the study is of significance because of its application of CDA to religious education policy documents. Use of CDA reveals the foregrounding, marginalising, or excluding of various actors in the policy arena.
Resumo:
While close talking microphones give the best signal quality and produce the highest accuracy from current Automatic Speech Recognition (ASR) systems, the speech signal enhanced by microphone array has been shown to be an effective alternative in a noisy environment. The use of microphone arrays in contrast to close talking microphones alleviates the feeling of discomfort and distraction to the user. For this reason, microphone arrays are popular and have been used in a wide range of applications such as teleconferencing, hearing aids, speaker tracking, and as the front-end to speech recognition systems. With advances in sensor and sensor network technology, there is considerable potential for applications that employ ad-hoc networks of microphone-equipped devices collaboratively as a virtual microphone array. By allowing such devices to be distributed throughout the users’ environment, the microphone positions are no longer constrained to traditional fixed geometrical arrangements. This flexibility in the means of data acquisition allows different audio scenes to be captured to give a complete picture of the working environment. In such ad-hoc deployment of microphone sensors, however, the lack of information about the location of devices and active speakers poses technical challenges for array signal processing algorithms which must be addressed to allow deployment in real-world applications. While not an ad-hoc sensor network, conditions approaching this have in effect been imposed in recent National Institute of Standards and Technology (NIST) ASR evaluations on distant microphone recordings of meetings. The NIST evaluation data comes from multiple sites, each with different and often loosely specified distant microphone configurations. This research investigates how microphone array methods can be applied for ad-hoc microphone arrays. A particular focus is on devising methods that are robust to unknown microphone placements in order to improve the overall speech quality and recognition performance provided by the beamforming algorithms. In ad-hoc situations, microphone positions and likely source locations are not known and beamforming must be achieved blindly. There are two general approaches that can be employed to blindly estimate the steering vector for beamforming. The first is direct estimation without regard to the microphone and source locations. An alternative approach is instead to first determine the unknown microphone positions through array calibration methods and then to use the traditional geometrical formulation for the steering vector. Following these two major approaches investigated in this thesis, a novel clustered approach which includes clustering the microphones and selecting the clusters based on their proximity to the speaker is proposed. Novel experiments are conducted to demonstrate that the proposed method to automatically select clusters of microphones (ie, a subarray), closely located both to each other and to the desired speech source, may in fact provide a more robust speech enhancement and recognition than the full array could.