762 resultados para Perceptual Speech Evaluation
em Queensland University of Technology - ePrints Archive
Resumo:
Automatic speech recognition from multiple distant micro- phones poses significant challenges because of noise and reverberations. The quality of speech acquisition may vary between microphones because of movements of speakers and channel distortions. This paper proposes a channel selection approach for selecting reliable channels based on selection criterion operating in the short-term modulation spectrum domain. The proposed approach quantifies the relative strength of speech from each microphone and speech obtained from beamforming modulations. The new technique is compared experimentally in the real reverb conditions in terms of perceptual evaluation of speech quality (PESQ) measures and word error rate (WER). Overall improvement in recognition rate is observed using delay-sum and superdirective beamformers compared to the case when the channel is selected randomly using circular microphone arrays.
Resumo:
While close talking microphones give the best signal quality and produce the highest accuracy from current Automatic Speech Recognition (ASR) systems, the speech signal enhanced by microphone array has been shown to be an effective alternative in a noisy environment. The use of microphone arrays in contrast to close talking microphones alleviates the feeling of discomfort and distraction to the user. For this reason, microphone arrays are popular and have been used in a wide range of applications such as teleconferencing, hearing aids, speaker tracking, and as the front-end to speech recognition systems. With advances in sensor and sensor network technology, there is considerable potential for applications that employ ad-hoc networks of microphone-equipped devices collaboratively as a virtual microphone array. By allowing such devices to be distributed throughout the users’ environment, the microphone positions are no longer constrained to traditional fixed geometrical arrangements. This flexibility in the means of data acquisition allows different audio scenes to be captured to give a complete picture of the working environment. In such ad-hoc deployment of microphone sensors, however, the lack of information about the location of devices and active speakers poses technical challenges for array signal processing algorithms which must be addressed to allow deployment in real-world applications. While not an ad-hoc sensor network, conditions approaching this have in effect been imposed in recent National Institute of Standards and Technology (NIST) ASR evaluations on distant microphone recordings of meetings. The NIST evaluation data comes from multiple sites, each with different and often loosely specified distant microphone configurations. This research investigates how microphone array methods can be applied for ad-hoc microphone arrays. A particular focus is on devising methods that are robust to unknown microphone placements in order to improve the overall speech quality and recognition performance provided by the beamforming algorithms. In ad-hoc situations, microphone positions and likely source locations are not known and beamforming must be achieved blindly. There are two general approaches that can be employed to blindly estimate the steering vector for beamforming. The first is direct estimation without regard to the microphone and source locations. An alternative approach is instead to first determine the unknown microphone positions through array calibration methods and then to use the traditional geometrical formulation for the steering vector. Following these two major approaches investigated in this thesis, a novel clustered approach which includes clustering the microphones and selecting the clusters based on their proximity to the speaker is proposed. Novel experiments are conducted to demonstrate that the proposed method to automatically select clusters of microphones (ie, a subarray), closely located both to each other and to the desired speech source, may in fact provide a more robust speech enhancement and recognition than the full array could.
Resumo:
Traditional speech enhancement methods optimise signal-level criteria such as signal-to-noise ratio, but these approaches are sub-optimal for noise-robust speech recognition. Likelihood-maximising (LIMA) frameworks are an alternative that optimise parameters of enhancement algorithms based on state sequences generated for utterances with known transcriptions. Previous reports of LIMA frameworks have shown significant promise for improving speech recognition accuracies under additive background noise for a range of speech enhancement techniques. In this paper we discuss the drawbacks of the LIMA approach when multiple layers of acoustic mismatch are present – namely background noise and speaker accent. Experimentation using LIMA-based Mel-filterbank noise subtraction on American and Australian English in-car speech databases supports this discussion, demonstrating that inferior speech recognition performance occurs when a second layer of mismatch is seen during evaluation.
Resumo:
This study is conducted within the IS-Impact Research Track at Queensland University of Technology (QUT). The goal of the IS-Impact Track is, "to develop the most widely employed model for benchmarking information systems in organizations for the joint benefit of both research and practice" (Gable et al, 2006). IS-Impact is defined as "a measure at a point in time, of the stream of net benefits from the IS [Information System], to date and anticipated, as perceived by all key-user-groups" (Gable Sedera and Chan, 2008). Track efforts have yielded the bicameral IS-Impact measurement model; the "impact" half includes Organizational-Impact and Individual-Impact dimensions; the "quality" half includes System-Quality and Information-Quality dimensions. The IS-Impact model, by design, is intended to be robust, simple and generalisable, to yield results that are comparable across time, stakeholders, different systems and system contexts. The model and measurement approach employs perceptual measures and an instrument that is relevant to key stakeholder groups, thereby enabling the combination or comparison of stakeholder perspectives. Such a validated and widely accepted IS-Impact measurement model has both academic and practical value. It facilitates systematic operationalisation of a main dependent variable in research (IS-Impact), which can also serve as an important independent variable. For IS management practice it provides a means to benchmark and track the performance of information systems in use. From examination of the literature, the study proposes that IS-Impact is an Analytic Theory. Gregor (2006) defines Analytic Theory simply as theory that ‘says what is’, base theory that is foundational to all other types of theory. The overarching research question thus is "Does IS-Impact positively manifest the attributes of Analytic Theory?" In order to address this question, we must first answer the question "What are the attributes of Analytic Theory?" The study identifies the main attributes of analytic theory as: (1) Completeness, (2) Mutual Exclusivity, (3) Parsimony, (4) Appropriate Hierarchy, (5) Utility, and (6) Intuitiveness. The value of empirical research in Information Systems is often assessed along the two main dimensions - rigor and relevance. Those Analytic Theory attributes associated with the ‘rigor’ of the IS-Impact model; namely, completeness, mutual exclusivity, parsimony and appropriate hierarchy, have been addressed in prior research (e.g. Gable et al, 2008). Though common tests of rigor are widely accepted and relatively uniformly applied (particularly in relation to positivist, quantitative research), attention to relevance has seldom been given the same systematic attention. This study assumes a mainly practice perspective, and emphasises the methodical evaluation of the Analytic Theory ‘relevance’ attributes represented by the Utility and Intuitiveness of the IS-Impact model. Thus, related research questions are: "Is the IS-Impact model intuitive to practitioners?" and "Is the IS-Impact model useful to practitioners?" March and Smith (1995), identify four outputs of Design Science: constructs, models, methods and instantiations (Design Science research may involve one or more of these). IS-Impact can be viewed as a design science model, composed of Design Science constructs (the four IS-Impact dimensions and the two model halves), and instantiations in the form of management information (IS-Impact data organised and presented for management decision making). In addition to methodically evaluating the Utility and Intuitiveness of the IS-Impact model and its constituent constructs, the study aims to also evaluate the derived management information. Thus, further research questions are: "Is the IS-Impact derived management information intuitive to practitioners?" and "Is the IS-Impact derived management information useful to practitioners? The study employs a longitudinal design entailing three surveys over 4 years (the 1st involving secondary data) of the Oracle-Financials application at QUT, interspersed with focus groups involving senior financial managers. The study too entails a survey of Financials at four other Australian Universities. The three focus groups respectively emphasise: (1) the IS-Impact model, (2) the 2nd survey at QUT (descriptive), and (3) comparison across surveys within QUT, and between QUT and the group of Universities. Aligned with the track goal of producing IS-Impact scores that are highly comparable, the study also addresses the more specific utility-related questions, "Is IS-Impact derived management information a useful comparator across time?" and "Is IS-Impact derived management information a useful comparator across universities?" The main contribution of the study is evidence of the utility and intuitiveness of IS-Impact to practice, thereby further substantiating the practical value of the IS-Impact approach; and also thereby motivating continuing and further research on the validity of IS-Impact, and research employing the ISImpact constructs in descriptive, predictive and explanatory studies. The study also has value methodologically as an example of relatively rigorous attention to relevance. A further key contribution is the clarification and instantiation of the full set of analytic theory attributes.
Resumo:
Automatic spoken Language Identi¯cation (LID) is the process of identifying the language spoken within an utterance. The challenge that this task presents is that no prior information is available indicating the content of the utterance or the identity of the speaker. The trend of globalization and the pervasive popularity of the Internet will amplify the need for the capabilities spoken language identi¯ca- tion systems provide. A prominent application arises in call centers dealing with speakers speaking di®erent languages. Another important application is to index or search huge speech data archives and corpora that contain multiple languages. The aim of this research is to develop techniques targeted at producing a fast and more accurate automatic spoken LID system compared to the previous National Institute of Standards and Technology (NIST) Language Recognition Evaluation. Acoustic and phonetic speech information are targeted as the most suitable fea- tures for representing the characteristics of a language. To model the acoustic speech features a Gaussian Mixture Model based approach is employed. Pho- netic speech information is extracted using existing speech recognition technol- ogy. Various techniques to improve LID accuracy are also studied. One approach examined is the employment of Vocal Tract Length Normalization to reduce the speech variation caused by di®erent speakers. A linear data fusion technique is adopted to combine the various aspects of information extracted from speech. As a result of this research, a LID system was implemented and presented for evaluation in the 2003 Language Recognition Evaluation conducted by the NIST.
Resumo:
The QUT-NOISE-TIMIT corpus consists of 600 hours of noisy speech sequences designed to enable a thorough evaluation of voice activity detection (VAD) algorithms across a wide variety of common background noise scenarios. In order to construct the final mixed-speech database, a collection of over 10 hours of background noise was conducted across 10 unique locations covering 5 common noise scenarios, to create the QUT-NOISE corpus. This background noise corpus was then mixed with speech events chosen from the TIMIT clean speech corpus over a wide variety of noise lengths, signal-to-noise ratios (SNRs) and active speech proportions to form the mixed-speech QUT-NOISE-TIMIT corpus. The evaluation of five baseline VAD systems on the QUT-NOISE-TIMIT corpus is conducted to validate the data and show that the variety of noise available will allow for better evaluation of VAD systems than existing approaches in the literature.
Resumo:
In this paper, we present a microphone array beamforming approach to blind speech separation. Unlike previous beamforming approaches, our system does not require a-priori knowledge of the microphone placement and speaker location, making the system directly comparable other blind source separation methods which require no prior knowledge of recording conditions. Microphone location is automatically estimated using an assumed noise field model, and speaker locations are estimated using cross correlation based methods. The system is evaluated on the data provided for the PASCAL Speech Separation Challenge 2 (SSC2), achieving a word error rate of 58% on the evaluation set.
Resumo:
Automobiles have deeply impacted the way in which we travel but they have also contributed to many deaths and injury due to crashes. A number of reasons for these crashes have been pointed out by researchers. Inexperience has been identified as a contributing factor to road crashes. Driver’s driving abilities also play a vital role in judging the road environment and reacting in-time to avoid any possible collision. Therefore driver’s perceptual and motor skills remain the key factors impacting on road safety. Our failure to understand what is really important for learners, in terms of competent driving, is one of the many challenges for building better training programs. Driver training is one of the interventions aimed at decreasing the number of crashes that involve young drivers. Currently, there is a need to develop comprehensive driver evaluation system that benefits from the advances in Driver Assistance Systems. A multidisciplinary approach is necessary to explain how driving abilities evolves with on-road driving experience. To our knowledge, driver assistance systems have never been comprehensively used in a driver training context to assess the safety aspect of driving. The aim and novelty of this thesis is to develop and evaluate an Intelligent Driver Training System (IDTS) as an automated assessment tool that will help drivers and their trainers to comprehensively view complex driving manoeuvres and potentially provide effective feedback by post processing the data recorded during driving. This system is designed to help driver trainers to accurately evaluate driver performance and has the potential to provide valuable feedback to the drivers. Since driving is dependent on fuzzy inputs from the driver (i.e. approximate distance calculation from the other vehicles, approximate assumption of the other vehicle speed), it is necessary that the evaluation system is based on criteria and rules that handles uncertain and fuzzy characteristics of the driving tasks. Therefore, the proposed IDTS utilizes fuzzy set theory for the assessment of driver performance. The proposed research program focuses on integrating the multi-sensory information acquired from the vehicle, driver and environment to assess driving competencies. After information acquisition, the current research focuses on automated segmentation of the selected manoeuvres from the driving scenario. This leads to the creation of a model that determines a “competency” criterion through the driving performance protocol used by driver trainers (i.e. expert knowledge) to assess drivers. This is achieved by comprehensively evaluating and assessing the data stream acquired from multiple in-vehicle sensors using fuzzy rules and classifying the driving manoeuvres (i.e. overtake, lane change, T-crossing and turn) between low and high competency. The fuzzy rules use parameters such as following distance, gaze depth and scan area, distance with respect to lanes and excessive acceleration or braking during the manoeuvres to assess competency. These rules that identify driving competency were initially designed with the help of expert’s knowledge (i.e. driver trainers). In-order to fine tune these rules and the parameters that define these rules, a driving experiment was conducted to identify the empirical differences between novice and experienced drivers. The results from the driving experiment indicated that significant differences existed between novice and experienced driver, in terms of their gaze pattern and duration, speed, stop time at the T-crossing, lane keeping and the time spent in lanes while performing the selected manoeuvres. These differences were used to refine the fuzzy membership functions and rules that govern the assessments of the driving tasks. Next, this research focused on providing an integrated visual assessment interface to both driver trainers and their trainees. By providing a rich set of interactive graphical interfaces, displaying information about the driving tasks, Intelligent Driver Training System (IDTS) visualisation module has the potential to give empirical feedback to its users. Lastly, the validation of the IDTS system’s assessment was conducted by comparing IDTS objective assessments, for the driving experiment, with the subjective assessments of the driver trainers for particular manoeuvres. Results show that not only IDTS was able to match the subjective assessments made by driver trainers during the driving experiment but also identified some additional driving manoeuvres performed in low competency that were not identified by the driver trainers due to increased mental workload of trainers when assessing multiple variables that constitute driving. The validation of IDTS emphasized the need for an automated assessment tool that can segment the manoeuvres from the driving scenario, further investigate the variables within that manoeuvre to determine the manoeuvre’s competency and provide integrated visualisation regarding the manoeuvre to its users (i.e. trainers and trainees). Through analysis and validation it was shown that IDTS is a useful assistance tool for driver trainers to empirically assess and potentially provide feedback regarding the manoeuvres undertaken by the drivers.
Resumo:
In 1990 the Dispute Resolution Centres Act, 1990 (Qld) (the Act) was passed by the Queensland Parliament. In the second reading speech for the Dispute Resolution Centres Bill on May 1990 the Hon Dean Wells stated that the proposed legislation would make mediation services available “in a non-coercive, voluntary forum where, with the help of trained mediators, the disputants will be assisted towards their own solutions to their disputes, thereby ensuring that the result is acceptable to the parties” (Hansard, 1990, 1718). It was recognised at that time that a method for resolving disputes was necessary for which “the conventional court system is not always equipped to provide lasting resolution” (Hansard, 1990, 1717). In particular, the lasting resolution of “disputes between people in continuing relationships” was seen as made possible through the new legislation; for example, “domestic disputes, disputes between employees, and neighbourhood disputes relating to such issues as overhanging tree branches, dividing fences, barking dogs, smoke, noise and other nuisances are occurring continually in the community” (Hansard, 1990, 1717). The key features of the proposed form of mediation in the Act were articulated as follows: “attendance of both parties at mediation sessions is voluntary; a party may withdraw at any time; mediation sessions will be conducted with as little formality and technicality as possible; the rules of evidence will not apply; any agreement reached is not enforceable in any court; although it could be made so if the parties chose to proceed that way; and the provisions of the Act do not affect any rights or remedies that a party to a dispute has apart from the Act” (Hansard, 1990, 1718). Since the introduction of the Act, the Alternative Dispute Resolution Branch of the Queensland Department of Justice and Attorney General has offered mediation services through, first the Community Justice Program (CJP), and then the Dispute Resolution Centres (DRCs) for a range of family, neighbourhood, workplace and community disputes. These services have mirrored those available through similar government agencies in other states such as the Community Justice Centres of NSW and the Victorian Dispute Resolution Centres. Since 1990, mediation has become one of the fastest growing forms of alternative dispute resolution (ADR). Sourdin has commented that "In addition to the growth in court-based and community-based dispute resolution schemes, ADR has been institutionalised and has grown within Australia and overseas” (2005, 14). In Australia, in particular, the development of ADR service provision “has been assisted by the creation and growth of professional organisations such as the Leading Edge Alternative Dispute Resolvers (LEADR), the Australian Commercial Dispute Centres (ACDC), Australian Disputes Resolution Association (ADRA), Conflict Resolution Network, and the Institute of Arbitrators and Mediators Australia (IAMA)” (Sourdin, 2005, 14). The increased emphasis on the use of ADR within education contexts (particularly secondary and tertiary contexts) has “also led to an increasing acceptance and understanding of (ADR) processes” (Sourdin, 2005, 14). Proponents of the mediation process, in particular, argue that much of its success derives from the inherent flexibility and creativity of the agreements reached through the mediation process and that it is a relatively low cost option in many cases (Menkel-Meadow, 1997, 417). It is also accepted that one of the main reasons for the success of mediation can be attributed to the high level of participation by the parties involved and thus creating a sense of ownership of, and commitment to, the terms of the agreement (Boulle, 2005, 65). These characteristics are associated with some of the core values of mediation, particularly as practised in community-based models as found at the DRCs. These core values include voluntary participation, party self-determination and party empowerment (Boulle, 2005, 65). For this reason mediation is argued as being an effective approach to resolving disputes, that creates a lasting resolution of the issues. Evaluation of the mediation process, particularly in the context of the growth of ADR, has been an important aspect of the development of the process (Sourdin, 2008). Writing in 2005 for example, Boulle, states that “although there is a constant refrain for more research into mediation practice, there has been a not insignificant amount of mediation measurement, both in Australia and overseas” (Boulle, 2005, 575). The positive claims of mediation have been supported to a significant degree by evaluations of the efficiency and effectiveness of the process. A common indicator of the effectiveness of mediation is the settlement rate achieved. High settlement rates for mediated disputes have been found for Australia (Altobelli, 2003) and internationally (Alexander, 2003). Boulle notes that mediation agreement rates claimed by service providers range from 55% to 92% (Boulle, 2005, 590). The annual reports for the Alternative Dispute Resolution Branch of the Queensland Department of Justice and Attorney-General considered prior to the commencement of this study indicated generally achievement of an approximate settlement figure of 86% by the Queensland Dispute Resolution Centres. More recently, the 2008-2009 annual report states that of the 2291 civil dispute mediated in 2007-2008, 86% reached an agreement. Further, of the 2693 civil disputes mediated in 2008-2009, 73% reached an agreement. These results are noted in the report as indicating “the effectiveness of mediation in resolving disputes” and as reflecting “the high level of agreement achieved for voluntary mediations” (Annual Report, 2008-2009, online). Whilst the settlement rates for the DRCs are strong, parties are rarely contacted for long term follow-up to assess whether agreements reached during mediation lasted to the satisfaction of each party. It has certainly been the case that the Dispute Resolution Centres of Queensland have not been resourced to conduct long-term follow-up assessments of mediation agreements. As Wade notes, "it is very difficult to compare "success" rates” and whilst “politicians want the comparison studies (they) usually do not want the delay and expense of accurate studies" (1998, 114). To date, therefore, it is fair to say that the efficiency of the mediation process has been evaluated but not necessarily its effectiveness. Rather, the practice at the Queensland DRCs has been to evaluate the quality of mediation service provision and of the practice of the mediation process. This has occurred, for example, through follow-up surveys of parties' satisfaction rates with the mediation service. In most other respects it is fair to say that the Centres have relied on the high settlement rates of the mediation process as a sign of the effectiveness of mediation (Annual Reports 1991 - 2010). Research of the mediation literature conducted for the purpose of this thesis has also indicated that there is little evaluative literature that provides an in-depth analysis and assessment of the longevity of mediated agreements. Instead evaluative studies of mediation tend to assess how mediation is conducted, or compare mediation with other conflict resolution options, or assess the agreement rate of mediations, including parties' levels of satisfaction with the service provision of the dispute resolution service provider (Boulle, 2005, Chapter 16).
Resumo:
Sound tagging has been studied for years. Among all sound types, music, speech, and environmental sound are three hottest research areas. This survey aims to provide an overview about the state-of-the-art development in these areas.We discuss about the meaning of tagging in different sound areas at the beginning of the journey. Some examples of sound tagging applications are introduced in order to illustrate the significance of this research. Typical tagging techniques include manual, automatic, and semi-automatic approaches.After reviewing work in music, speech and environmental sound tagging, we compare them and state the research progress to date. Research gaps are identified for each research area and the common features and discriminations between three areas are discovered as well. Published datasets, tools used by researchers, and evaluation measures frequently applied in the analysis are listed. In the end, we summarise the worldwide distribution of countries dedicated to sound tagging research for years.
Resumo:
This work aims to contribute to the reliability and integrity of perceptual systems of unmanned ground vehicles (UGV). A method is proposed to evaluate the quality of sensor data prior to its use in a perception system by utilising a quality metric applied to heterogeneous sensor data such as visual and infrared camera images. The concept is illustrated specifically with sensor data that is evaluated prior to the use of the data in a standard SIFT feature extraction and matching technique. The method is then evaluated using various experimental data sets that were collected from a UGV in challenging environmental conditions, represented by the presence of airborne dust and smoke. In the first series of experiments, a motionless vehicle is observing a ’reference’ scene, then the method is extended to the case of a moving vehicle by compensating for its motion. This paper shows that it is possible to anticipate degradation of a perception algorithm by evaluating the input data prior to any actual execution of the algorithm.
Resumo:
The QUT-NOISE-SRE protocol is designed to mix the large QUT-NOISE database, consisting of over 10 hours of back- ground noise, collected across 10 unique locations covering 5 common noise scenarios, with commonly used speaker recognition datasets such as Switchboard, Mixer and the speaker recognition evaluation (SRE) datasets provided by NIST. By allowing common, clean, speech corpora to be mixed with a wide variety of noise conditions, environmental reverberant responses, and signal-to-noise ratios, this protocol provides a solid basis for the development, evaluation and benchmarking of robust speaker recognition algorithms, and is freely available to download alongside the QUT-NOISE database. In this work, we use the QUT-NOISE-SRE protocol to evaluate a state-of-the-art PLDA i-vector speaker recognition system, demonstrating the importance of designing voice-activity-detection front-ends specifically for speaker recognition, rather than aiming for perfect coherence with the true speech/non-speech boundaries.
An Intervention Study to Improve the Transfer of ICU Patients to the Ward - Evaluation by ICU Nurses