51 resultados para keyword spotting
em Queensland University of Technology - ePrints Archive
Resumo:
Keyword Spotting is the task of detecting keywords of interest within continu- ous speech. The applications of this technology range from call centre dialogue systems to covert speech surveillance devices. Keyword spotting is particularly well suited to data mining tasks such as real-time keyword monitoring and unre- stricted vocabulary audio document indexing. However, to date, many keyword spotting approaches have su®ered from poor detection rates, high false alarm rates, or slow execution times, thus reducing their commercial viability. This work investigates the application of keyword spotting to data mining tasks. The thesis makes a number of major contributions to the ¯eld of keyword spotting. The ¯rst major contribution is the development of a novel keyword veri¯cation method named Cohort Word Veri¯cation. This method combines high level lin- guistic information with cohort-based veri¯cation techniques to obtain dramatic improvements in veri¯cation performance, in particular for the problematic short duration target word class. The second major contribution is the development of a novel audio document indexing technique named Dynamic Match Lattice Spotting. This technique aug- ments lattice-based audio indexing principles with dynamic sequence matching techniques to provide robustness to erroneous lattice realisations. The resulting algorithm obtains signi¯cant improvement in detection rate over lattice-based audio document indexing while still maintaining extremely fast search speeds. The third major contribution is the study of multiple veri¯er fusion for the task of keyword veri¯cation. The reported experiments demonstrate that substantial improvements in veri¯cation performance can be obtained through the fusion of multiple keyword veri¯ers. The research focuses on combinations of speech background model based veri¯ers and cohort word veri¯ers. The ¯nal major contribution is a comprehensive study of the e®ects of limited training data for keyword spotting. This study is performed with consideration as to how these e®ects impact the immediate development and deployment of speech technologies for non-English languages.
Resumo:
The support for typically out-of-vocabulary query terms such as names, acronyms, and foreign words is an important requirement of many speech indexing applications. However, to date many unrestricted vocabulary indexing systems have struggled to provide a balance between good detection rate and fast query speeds. This paper presents a fast and accurate unrestricted vocabulary speech indexing technique named Dynamic Match Lattice Spotting (DMLS). The proposed method augments the conventional lattice spotting technique with dynamic sequence matching, together with a number of other novel algorithmic enhancements, to obtain a system that is capable of searching hours of speech in seconds while maintaining excellent detection performance
Resumo:
For the first time in human history, large volumes of spoken audio are being broadcast, made available on the internet, archived, and monitored for surveillance every day. New technologies are urgently required to unlock these vast and powerful stores of information. Spoken Term Detection (STD) systems provide access to speech collections by detecting individual occurrences of specified search terms. The aim of this work is to develop improved STD solutions based on phonetic indexing. In particular, this work aims to develop phonetic STD systems for applications that require open-vocabulary search, fast indexing and search speeds, and accurate term detection. Within this scope, novel contributions are made within two research themes, that is, accommodating phone recognition errors and, secondly, modelling uncertainty with probabilistic scores. A state-of-the-art Dynamic Match Lattice Spotting (DMLS) system is used to address the problem of accommodating phone recognition errors with approximate phone sequence matching. Extensive experimentation on the use of DMLS is carried out and a number of novel enhancements are developed that provide for faster indexing, faster search, and improved accuracy. Firstly, a novel comparison of methods for deriving a phone error cost model is presented to improve STD accuracy, resulting in up to a 33% improvement in the Figure of Merit. A method is also presented for drastically increasing the speed of DMLS search by at least an order of magnitude with no loss in search accuracy. An investigation is then presented of the effects of increasing indexing speed for DMLS, by using simpler modelling during phone decoding, with results highlighting the trade-off between indexing speed, search speed and search accuracy. The Figure of Merit is further improved by up to 25% using a novel proposal to utilise word-level language modelling during DMLS indexing. Analysis shows that this use of language modelling can, however, be unhelpful or even disadvantageous for terms with a very low language model probability. The DMLS approach to STD involves generating an index of phone sequences using phone recognition. An alternative approach to phonetic STD is also investigated that instead indexes probabilistic acoustic scores in the form of a posterior-feature matrix. A state-of-the-art system is described and its use for STD is explored through several experiments on spontaneous conversational telephone speech. A novel technique and framework is proposed for discriminatively training such a system to directly maximise the Figure of Merit. This results in a 13% improvement in the Figure of Merit on held-out data. The framework is also found to be particularly useful for index compression in conjunction with the proposed optimisation technique, providing for a substantial index compression factor in addition to an overall gain in the Figure of Merit. These contributions significantly advance the state-of-the-art in phonetic STD, by improving the utility of such systems in a wide range of applications.
Resumo:
Spoken term detection (STD) is the task of looking up a spoken term in a large volume of speech segments. In order to provide fast search, speech segments are first indexed into an intermediate representation using speech recognition engines which provide multiple hypotheses for each speech segment. Approximate matching techniques are usually applied at the search stage to compensate the poor performance of automatic speech recognition engines during indexing. Recently, using visual information in addition to audio information has been shown to improve phone recognition performance, particularly in noisy environments. In this paper, we will make use of visual information in the form of lip movements of the speaker in indexing stage and will investigate its effect on STD performance. Particularly, we will investigate if gains in phone recognition accuracy will carry through the approximate matching stage to provide similar gains in the final audio-visual STD system over a traditional audio only approach. We will also investigate the effect of using visual information on STD performance in different noise environments.
Resumo:
The world we live in is well labeled for the benefit of humans but to date robots have made little use of this resource. In this paper we describe a system that allows robots to read and interpret visible text and use it to understand the content of the scene. We use a generative probabilistic model that explains spotted text in terms of arbitrary search terms. This allows the robot to understand the underlying function of the scene it is looking at, such as whether it is a bank or a restaurant. We describe the text spotting engine at the heart of our system that is able to detect and parse wild text in images, and the generative model, and present results from images obtained with a robot in a busy city setting.
Resumo:
A growing body of research is looking at ways to bring the processes and benefits of online deliberation to the places they are about and in turn allow a larger, targeted proportion of the urban public to have a voice, be heard, and engage in questions of city planning and design. Seeking to take advantage of the civic opportunities of situated engagement through public screens and mobile devices, our research informed a public urban screen content application DIS that we deployed and evaluated in a wide range of real world public and urban environments. For example, it is currently running on the renowned urban screen at Federation Square in Melbourne. We analysed the data from these user studies within a conceptual framework that positions situated engagement across three key parameters: people, content, and location. We propose a way to identify the sweet spot within the nexus of these parameters to help deploy and run interactive systems to maximise the quality of the situated engagement for civic and related deliberation purposes.
Resumo:
This paper describes a study designed to understand player responses to artificially intelligent opponents in multi-player First Person Shooter games. It examines the player's ability to tell the difference between artificially intelligent opponents and other human players, and investigates the players' perceptions of these opponents. The study examines player preferences in this regard and identifies the significance of the cues and signs players use to categorise an opponent as artificial or human.
Resumo:
We consider the following problem: members in a dynamic group retrieve their encrypted data from an untrusted server based on keywords and without any loss of data confidentiality and member’s privacy. In this paper, we investigate common secure indices for conjunctive keyword-based retrieval over encrypted data, and construct an efficient scheme from Wang et al. dynamic accumulator, Nyberg combinatorial accumulator and Kiayias et al. public-key encryption system. The proposed scheme is trapdoorless and keyword-field free. The security is proved under the random oracle, decisional composite residuosity and extended strong RSA assumptions.
Resumo:
We consider the following problem: users of an organization wish to outsource the storage of sensitive data to a large database server. It is assumed that the server storing the data is untrusted so the data stored have to be encrypted. We further suppose that the manager of the organization has the right to access all data, but a member of the organization can not access any data alone. The member must collaborate with other members to search for the desired data. In this paper, we investigate the notion of threshold privacy preserving keyword search (TPPKS) and define its security requirements. We construct a TPPKS scheme and show the proof of security under the assumptions of intractability of discrete logarithm, decisional Diffie-Hellman and computational Diffie-Hellman problems.
Resumo:
We consider the following problem: users in a dynamic group store their encrypted documents on an untrusted server, and wish to retrieve documents containing some keywords without any loss of data confidentiality. In this paper, we investigate common secure indices which can make multi-users in a dynamic group to obtain securely the encrypted documents shared among the group members without re-encrypting them. We give a formal definition of common secure index for conjunctive keyword-based retrieval over encrypted data (CSI-CKR), define the security requirement for CSI-CKR, and construct a CSI-CKR based on dynamic accumulators, Paillier’s cryptosystem and blind signatures. The security of proposed scheme is proved under strong RSA and co-DDH assumptions.
Resumo:
A dynamic accumulator is an algorithm, which merges a large set of elements into a constant-size value such that for an element accumulated, there is a witness confirming that the element was included into the value, with a property that accumulated elements can be dynamically added and deleted into/from the original set. Recently Wang et al. presented a dynamic accumulator for batch updates at ICICS 2007. However, their construction suffers from two serious problems. We analyze them and propose a way to repair their scheme. We use the accumulator to construct a new scheme for common secure indices with conjunctive keyword-based retrieval.
Resumo:
We consider the following problem: a user stores encrypted documents on an untrusted server, and wishes to retrieve all documents containing some keywords without any loss of data confidentiality. Conjunctive keyword searches on encrypted data have been studied by numerous researchers over the past few years, and all existing schemes use keyword fields as compulsory information. This however is impractical for many applications. In this paper, we propose a scheme of keyword field-free conjunctive keyword searches on encrypted data, which affirmatively answers an open problem asked by Golle et al. at ACNS 2004. Furthermore, the proposed scheme is extended to the dynamic group setting. Security analysis of our constructions is given in the paper.
Resumo:
Traditional information retrieval (IR) systems respond to user queries with ranked lists of relevant documents. The separation of content and structure in XML documents allows individual XML elements to be selected in isolation. Thus, users expect XML-IR systems to return highly relevant results that are more precise than entire documents. In this paper we describe the implementation of a search engine for XML document collections. The system is keyword based and is built upon an XML inverted file system. We describe the approach that was adopted to meet the requirements of Content Only (CO) and Vague Content and Structure (VCAS) queries in INEX 2004.
Resumo:
The historical challenge of environmental impact assessment (EIA) has been to predict project-based impacts accurately. Both EIA legislation and the practice of EIA have evolved over the last three decades in Canada, and the development of the discipline and science of environmental assessment has improved how we apply environmental assessment to complex projects. The practice of environmental assessment integrates the social and natural sciences and relies on an eclectic knowledge base from a wide range of sources. EIA methods and tools provide a means to structure and integrate knowledge in order to evaluate and predict environmental impacts.----- This Chapter will provide a brief overview of how impacts are identified and predicted. How do we determine what aspect of the natural and social environment will be affected when a mine is excavated? How does the practitioner determine the range of potential impacts, assess whether they are significant, and predict the consequences? There are no standard answers to these questions, but there are established methods to provide a foundation for scoping and predicting the potential impacts of a project.----- Of course, the community and publics play an important role in this process, and this will be discussed in subsequent chapters. In the first part of this chapter, we will deal with impact identification, which involves appplying scoping to critical issues and determining impact significance, baseline ecosystem evaluation techniques, and how to communicate environmental impacts. In the second part of the chapter, we discuss the prediction of impacts in relation to the complexity of the environment, ecological risk assessment, and modelling.