276 resultados para WORD


Relevância:

10.00% 10.00%

Publicador:

Resumo:

In recent times, the improved levels of accuracy obtained by Automatic Speech Recognition (ASR) technology has made it viable for use in a number of commercial products. Unfortunately, these types of applications are limited to only a few of the world’s languages, primarily because ASR development is reliant on the availability of large amounts of language specific resources. This motivates the need for techniques which reduce this language-specific, resource dependency. Ideally, these approaches should generalise across languages, thereby providing scope for rapid creation of ASR capabilities for resource poor languages. Cross Lingual ASR emerges as a means for addressing this need. Underpinning this approach is the observation that sound production is largely influenced by the physiological construction of the vocal tract, and accordingly, is human, and not language specific. As a result, a common inventory of sounds exists across languages; a property which is exploitable, as sounds from a resource poor, target language can be recognised using models trained on resource rich, source languages. One of the initial impediments to the commercial uptake of ASR technology was its fragility in more challenging environments, such as conversational telephone speech. Subsequent improvements in these environments has gained consumer confidence. Pragmatically, if cross lingual techniques are to considered a viable alternative when resources are limited, they need to perform under the same types of conditions. Accordingly, this thesis evaluates cross lingual techniques using two speech environments; clean read speech and conversational telephone speech. Languages used in evaluations are German, Mandarin, Japanese and Spanish. Results highlight that previously proposed approaches provide respectable results for simpler environments such as read speech, but degrade significantly when in the more taxing conversational environment. Two separate approaches for addressing this degradation are proposed. The first is based on deriving better target language lexical representation, in terms of the source language model set. The second, and ultimately more successful approach, focuses on improving the classification accuracy of context-dependent (CD) models, by catering for the adverse influence of languages specific phonotactic properties. Whilst the primary research goal in this thesis is directed towards improving cross lingual techniques, the catalyst for investigating its use was based on expressed interest from several organisations for an Indonesian ASR capability. In Indonesia alone, there are over 200 million speakers of some Malay variant, provides further impetus and commercial justification for speech related research on this language. Unfortunately, at the beginning of the candidature, limited research had been conducted on the Indonesian language in the field of speech science, and virtually no resources existed. This thesis details the investigative and development work dedicated towards obtaining an ASR system with a 10000 word recognition vocabulary for the Indonesian language.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Literally, the word compliance suggests conformity in fulfilling official requirements. The thesis presents the results of the analysis and design of a class of protocols called compliant cryptologic protocols (CCP). The thesis presents a notion for compliance in cryptosystems that is conducive as a cryptologic goal. CCP are employed in security systems used by at least two mutually mistrusting sets of entities. The individuals in the sets of entities only trust the design of the security system and any trusted third party the security system may include. Such a security system can be thought of as a broker between the mistrusting sets of entities. In order to provide confidence in operation for the mistrusting sets of entities, CCP must provide compliance verification mechanisms. These mechanisms are employed either by all the entities or a set of authorised entities in the system to verify the compliance of the behaviour of various participating entities with the rules of the system. It is often stated that confidentiality, integrity and authentication are the primary interests of cryptology. It is evident from the literature that authentication mechanisms employ confidentiality and integrity services to achieve their goal. Therefore, the fundamental services that any cryptographic algorithm may provide are confidentiality and integrity only. Since controlling the behaviour of the entities is not a feasible cryptologic goal,the verification of the confidentiality of any data is a futile cryptologic exercise. For example, there exists no cryptologic mechanism that would prevent an entity from willingly or unwillingly exposing its private key corresponding to a certified public key. The confidentiality of the data can only be assumed. Therefore, any verification in cryptologic protocols must take the form of integrity verification mechanisms. Thus, compliance verification must take the form of integrity verification in cryptologic protocols. A definition of compliance that is conducive as a cryptologic goal is presented as a guarantee on the confidentiality and integrity services. The definitions are employed to provide a classification mechanism for various message formats in a cryptologic protocol. The classification assists in the characterisation of protocols, which assists in providing a focus for the goals of the research. The resulting concrete goal of the research is the study of those protocols that employ message formats to provide restricted confidentiality and universal integrity services to selected data. The thesis proposes an informal technique to understand, analyse and synthesise the integrity goals of a protocol system. The thesis contains a study of key recovery,electronic cash, peer-review, electronic auction, and electronic voting protocols. All these protocols contain message format that provide restricted confidentiality and universal integrity services to selected data. The study of key recovery systems aims to achieve robust key recovery relying only on the certification procedure and without the need for tamper-resistant system modules. The result of this study is a new technique for the design of key recovery systems called hybrid key escrow. The thesis identifies a class of compliant cryptologic protocols called secure selection protocols (SSP). The uniqueness of this class of protocols is the similarity in the goals of the member protocols, namely peer-review, electronic auction and electronic voting. The problem statement describing the goals of these protocols contain a tuple,(I, D), where I usually refers to an identity of a participant and D usually refers to the data selected by the participant. SSP are interested in providing confidentiality service to the tuple for hiding the relationship between I and D, and integrity service to the tuple after its formation to prevent the modification of the tuple. The thesis provides a schema to solve the instances of SSP by employing the electronic cash technology. The thesis makes a distinction between electronic cash technology and electronic payment technology. It will treat electronic cash technology to be a certification mechanism that allows the participants to obtain a certificate on their public key, without revealing the certificate or the public key to the certifier. The thesis abstracts the certificate and the public key as the data structure called anonymous token. It proposes design schemes for the peer-review, e-auction and e-voting protocols by employing the schema with the anonymous token abstraction. The thesis concludes by providing a variety of problem statements for future research that would further enrich the literature.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Keyword Spotting is the task of detecting keywords of interest within continu- ous speech. The applications of this technology range from call centre dialogue systems to covert speech surveillance devices. Keyword spotting is particularly well suited to data mining tasks such as real-time keyword monitoring and unre- stricted vocabulary audio document indexing. However, to date, many keyword spotting approaches have su®ered from poor detection rates, high false alarm rates, or slow execution times, thus reducing their commercial viability. This work investigates the application of keyword spotting to data mining tasks. The thesis makes a number of major contributions to the ¯eld of keyword spotting. The ¯rst major contribution is the development of a novel keyword veri¯cation method named Cohort Word Veri¯cation. This method combines high level lin- guistic information with cohort-based veri¯cation techniques to obtain dramatic improvements in veri¯cation performance, in particular for the problematic short duration target word class. The second major contribution is the development of a novel audio document indexing technique named Dynamic Match Lattice Spotting. This technique aug- ments lattice-based audio indexing principles with dynamic sequence matching techniques to provide robustness to erroneous lattice realisations. The resulting algorithm obtains signi¯cant improvement in detection rate over lattice-based audio document indexing while still maintaining extremely fast search speeds. The third major contribution is the study of multiple veri¯er fusion for the task of keyword veri¯cation. The reported experiments demonstrate that substantial improvements in veri¯cation performance can be obtained through the fusion of multiple keyword veri¯ers. The research focuses on combinations of speech background model based veri¯ers and cohort word veri¯ers. The ¯nal major contribution is a comprehensive study of the e®ects of limited training data for keyword spotting. This study is performed with consideration as to how these e®ects impact the immediate development and deployment of speech technologies for non-English languages.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This thesis introduces the problem of conceptual ambiguity, or Shades of Meaning (SoM) that can exist around a term or entity. As an example consider President Ronald Reagan the ex-president of the USA, there are many aspects to him that are captured in text; the Russian missile deal, the Iran-contra deal and others. Simply finding documents with the word “Reagan” in them is going to return results that cover many different shades of meaning related to "Reagan". Instead it may be desirable to retrieve results around a specific shade of meaning of "Reagan", e.g., all documents relating to the Iran-contra scandal. This thesis investigates computational methods for identifying shades of meaning around a word, or concept. This problem is related to word sense ambiguity, but is more subtle and based less on the particular syntactic structures associated with or around an instance of the term and more with the semantic contexts around it. A particularly noteworthy difference from typical word sense disambiguation is that shades of a concept are not known in advance. It is up to the algorithm itself to ascertain these subtleties. It is the key hypothesis of this thesis that reducing the number of dimensions in the representation of concepts is a key part of reducing sparseness and thus also crucial in discovering their SoMwithin a given corpus.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This thesis addresses the problem of detecting and describing the same scene points in different wide-angle images taken by the same camera at different viewpoints. This is a core competency of many vision-based localisation tasks including visual odometry and visual place recognition. Wide-angle cameras have a large field of view that can exceed a full hemisphere, and the images they produce contain severe radial distortion. When compared to traditional narrow field of view perspective cameras, more accurate estimates of camera egomotion can be found using the images obtained with wide-angle cameras. The ability to accurately estimate camera egomotion is a fundamental primitive of visual odometry, and this is one of the reasons for the increased popularity in the use of wide-angle cameras for this task. Their large field of view also enables them to capture images of the same regions in a scene taken at very different viewpoints, and this makes them suited for visual place recognition. However, the ability to estimate the camera egomotion and recognise the same scene in two different images is dependent on the ability to reliably detect and describe the same scene points, or ‘keypoints’, in the images. Most algorithms used for this purpose are designed almost exclusively for perspective images. Applying algorithms designed for perspective images directly to wide-angle images is problematic as no account is made for the image distortion. The primary contribution of this thesis is the development of two novel keypoint detectors, and a method of keypoint description, designed for wide-angle images. Both reformulate the Scale- Invariant Feature Transform (SIFT) as an image processing operation on the sphere. As the image captured by any central projection wide-angle camera can be mapped to the sphere, applying these variants to an image on the sphere enables keypoints to be detected in a manner that is invariant to image distortion. Each of the variants is required to find the scale-space representation of an image on the sphere, and they differ in the approaches they used to do this. Extensive experiments using real and synthetically generated wide-angle images are used to validate the two new keypoint detectors and the method of keypoint description. The best of these two new keypoint detectors is applied to vision based localisation tasks including visual odometry and visual place recognition using outdoor wide-angle image sequences. As part of this work, the effect of keypoint coordinate selection on the accuracy of egomotion estimates using the Direct Linear Transform (DLT) is investigated, and a simple weighting scheme is proposed which attempts to account for the uncertainty of keypoint positions during detection. A word reliability metric is also developed for use within a visual ‘bag of words’ approach to place recognition.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this paper we propose a new method for utilising phase information by complementing it with traditional magnitude-only spectral subtraction speech enhancement through Complex Spectrum Subtraction (CSS). The proposed approach has the following advantages over traditional magnitude-only spectral subtraction: (a) it introduces complementary information to the enhancement algorithm; (b) it reduces the total number of algorithmic parameters, and; (c) is designed for improving clean speech magnitude spectra and is therefore suitable for both automatic speech recognition (ASR) and speech perception applications. Oracle-based ASR experiments verify this approach, showing an average of 20% relative word accuracy improvements when accurate estimates of the phase spectrum are available. Based on sinusoidal analysis and assuming stationarity between observations (which is shown to be better approximated as the frame rate is increased), this paper also proposes a novel method for acquiring the phase information called Phase Estimation via Delay Projection (PEDEP). Further oracle ASR experiments validate the potential for the proposed PEDEP technique in ideal conditions. Realistic implementation of CSS with PEDEP shows performance comparable to state of the art spectral subtraction techniques in a range of 15-20 dB signal-to-noise ratio environments. These results clearly demonstrate the potential for using phase spectra in spectral subtractive enhancement applications, and at the same time highlight the need for deriving more accurate phase estimates in a wider range of noise conditions.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Driver simulators provide safe conditions to assess driver behaviour and provide controlled and repeatable environments for study. They are a promising research tool in terms of both providing safety and experimentally well controlled environments. There are wide ranges of driver simulators, from laptops to advanced technologies which are controlled by several computers in a real car mounted on platforms with six degrees of freedom of movement. The applicability of simulator-based research in a particular study needs to be considered before starting the study, to determine whether the use of a simulator is actually appropriate for the research. Given the wide range of driver simulators and their uses, it is important to know beforehand how closely the results from a driver simulator match results found in the real word. Comparison between drivers’ performance under real road conditions and in particular simulators is a fundamental part of validation. The important question is whether the results obtained in a simulator mirror real world results. In this paper, the results of the most recently conducted research into validity of simulators is presented.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Traditionally, consumers who have been dissatisfied with service have typically complained to the frontline personnel or to a manager in either a direct (face-to-face, over the phone) manner, indirect by writing, or done nothing but told friends and family of the incident. More recently, the Internet has provided various “new” ways to air a grievance, especially when little might have been done at the point of service failure. With the opportunity to now spread word-of-mouth globally, consumers have the potential to impact the standing of a brand or a firm's reputation. The hotel industry is particularly vulnerable, as an increasing number of bookings are undertaken via the Internet and the decision process is likely to be influenced by what other previous guests might post on many booking-linked sites. We conducted a qualitative study of a key travel site to ascertain the forms and motives of complaints made online about hotels and resorts. 200 web-based consumer complaints were analyzed using NVivo 8 software. Findings revealed that consumers report a wide range of service failures on the Internet. They tell a highly descriptive, persuasive, and credible story, often motivated by altruism or, at the other end of the continuum, by revenge. These stories have the power to influence potential guests to book or not book accommodation at the affected properties. Implications for managers of hotels and resorts are discussed.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this paper, we present a microphone array beamforming approach to blind speech separation. Unlike previous beamforming approaches, our system does not require a-priori knowledge of the microphone placement and speaker location, making the system directly comparable other blind source separation methods which require no prior knowledge of recording conditions. Microphone location is automatically estimated using an assumed noise field model, and speaker locations are estimated using cross correlation based methods. The system is evaluated on the data provided for the PASCAL Speech Separation Challenge 2 (SSC2), achieving a word error rate of 58% on the evaluation set.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Voice recognition is one of the key enablers to reduce driver distraction as in-vehicle systems become more and more complex. With the integration of voice recognition in vehicles, safety and usability are improved as the driver’s eyes and hands are not required to operate system controls. Whilst speaker independent voice recognition is well developed, performance in high noise environments (e.g. vehicles) is still limited. La Trobe University and Queensland University of Technology have developed a low-cost hardware-based speech enhancement system for automotive environments based on spectral subtraction and delay–sum beamforming techniques. The enhancement algorithms have been optimised using authentic Australian English collected under typical driving conditions. Performance tests conducted using speech data collected under variety of vehicle noise conditions demonstrate a word recognition rate improvement in the order of 10% or more under the noisiest conditions. Currently developed to a proof of concept stage there is potential for even greater performance improvement.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The artwork was created to respond to the exhibition theme, "DIGILOG+IN". It aimed to express the beauty when digital and analogue materials are combined. It visualised an organic harmony between digital and natural objects through digitalisation and builded a fantasy of digital world. However, there was a conceptual dilemma that a “digitalisation” of natural objects into a digital format should merely become a digital work. In other words, a harmony between digital and analogue (natural) can be only achieved through a digitalising process by removing intrinsic nature of analogues. Therefore, the substance of analogues no longer exists in a digitally visualised form, but is virtually represented. The title of art work “digitualisation” is a combined word with “digi-tal” and vir-tualisation”. It refers to a digitally virtualising the substance of natural objects. The artwork visualised the concept of digitualisation by using natural objects (flowers) that are merged within a virtual space (a building entrance foyer).

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Curriculum demands continue to increase on school education systems with teachers at the forefront of implementing syllabus requirements. Education is reported frequently as a solution to most societal problems and, as a result of the world’s information explosion, teachers are expected to cover more and more within teaching programs. How can teachers combine subjects in order to capitalise on the competing educational agendas within school timeframes? Fusing curricula requires the bonding of standards from two or more syllabuses. Both technology and ICT complement the learning of science. This study analyses selected examples of preservice teachers’ overviews for fusing science, technology and ICT. These program overviews focused on primary students and the achievement of two standards (one from science and one from either technology or ICT). These primary preservice teachers’ fused-curricula overviews included scientific concepts and related technology and/or ICT skills and knowledge. Findings indicated a range of innovative curriculum plans for teaching primary science through technology and ICT, demonstrating that these subjects can form cohesive links towards achieving the respective learning standards. Teachers can work more astutely by fusing curricula; however further professional development may be required to advance thinking about these processes. Bonding subjects through their learning standards can extend beyond previous integration or thematic work where standards may not have been assessed. Education systems need to articulate through syllabus documents how effective fusing of curricula can be achieved. It appears that education is a key avenue for addressing societal needs, problems and issues. Education is promoted as a universal solution, which has resulted in curriculum overload (Dare, Durand, Moeller, & Washington, 1997; Vinson, 2001). Societal and curriculum demands have placed added pressure on teachers with many extenuating education issues increasing teachers’ workloads (Mobilise for Public Education, 2002). For example, as Australia has weather conducive for outdoor activities, social problems and issues arise that are reported through the media calling for action; consequently schools have been involved in swimming programs, road and bicycle safety programs, and a wide range of activities that had been considered a parental responsibility in the past. Teachers are expected to plan, implement and assess these extra-curricula activities within their already overcrowded timetables. At the same stage, key learning areas (KLAs) such as science and technology are mandatory requirements within all Australian education systems. These systems have syllabuses outlining levels of content and the anticipated learning outcomes (also known as standards, essential learnings, and frameworks). Time allocated for teaching science in obviously an issue. In 2001, it was estimated that on average the time spent in teaching science in Australian Primary Schools was almost an hour per week (Goodrum, Hackling, & Rennie, 2001). More recently, a study undertaken in the U.S. reported a similar finding. More than 80% of the teachers in K-5 classrooms spent less than an hour teaching science (Dorph, Goldstein, Lee, et al., 2007). More importantly, 16% did not spend teaching science in their classrooms. Teachers need to learn to work smarter by optimising the use of their in-class time. Integration is proposed as one of the ways to address the issue of curriculum overload (Venville & Dawson, 2005; Vogler, 2003). Even though there may be a lack of definition for integration (Hurley, 2001), curriculum integration aims at covering key concepts in two or more subject areas within the same lesson (Buxton & Whatley, 2002). This implies covering the curriculum in less time than if the subjects were taught separately; therefore teachers should have more time to cover other educational issues. Expectedly, the reality can be decidedly different (e.g., Brophy & Alleman, 1991; Venville & Dawson, 2005). Nevertheless, teachers report that students expand their knowledge and skills as a result of subject integration (James, Lamb, Householder, & Bailey, 2000). There seems to be considerable value for integrating science with other KLAs besides aiming to address teaching workloads. Over two decades ago, Cohen and Staley (1982) claimed that integration can bring a subject into the primary curriculum that may be otherwise left out. Integrating science education aims to develop a more holistic perspective. Indeed, life is not neat components of stand-alone subjects; life integrates subject content in numerous ways, and curriculum integration can assist students to make these real-life connections (Burnett & Wichman, 1997). Science integration can provide the scope for real-life learning and the possibility of targeting students’ learning styles more effectively by providing more than one perspective (Hudson & Hudson, 2001). To illustrate, technology is essential to science education (Blueford & Rosenbloom, 2003; Board of Studies, 1999; Penick, 2002), and constructing technology immediately evokes a social purpose for such construction (Marker, 1992). For example, building a model windmill requires science and technology (Zubrowski, 2002) but has a key focus on sustainability and the social sciences. Science has the potential to be integrated with all KLAs (e.g., Cohen & Staley, 1982; Dobbs, 1995; James et al., 2000). Yet, “integration” appears to be a confusing term. Integration has an educational meaning focused on special education students being assimilated into mainstream classrooms. The word integration was used in the late seventies and generally focused around thematic approaches for teaching. For instance, a science theme about flight only has to have a student drawing a picture of plane to show integration; it did not connect the anticipated outcomes from science and art. The term “fusing curricula” presents a seamless bonding between two subjects; hence standards (or outcomes) need to be linked from both subjects. This also goes beyond just embedding one subject within another. Embedding implies that one subject is dominant, while fusing curricula proposes an equal mix of learning within both subject areas. Primary education in Queensland has eight KLAs, each with its established content and each with a proposed structure for levels of learning. Primary teachers attempt to cover these syllabus requirements across the eight KLAs in less than five hours a day, and between many of the extra-curricula activities occurring throughout a school year (e.g., Easter activities, Education Week, concerts, excursions, performances). In Australia, education systems have developed standards for all KLAs (e.g., Education Queensland, NSW Department of Education and Training, Victorian Education) usually designated by a code. In the late 1990’s (in Queensland), “core learning outcomes” for strands across all KLA’s. For example, LL2.1 for the Queensland Education science syllabus means Life and Living at Level 2 standard number 1. Thus, a teacher’s planning requires the inclusion of standards as indicated by the presiding syllabus. More recently, the core learning outcomes were replaced by “essential learnings”. They specify “what students should be taught and what is important for students to have opportunities to know, understand and be able to do” (Queensland Studies Authority, 2009, para. 1). Fusing science education with other KLAs may facilitate more efficient use of time and resources; however this type of planning needs to combine standards from two syllabuses. To further assist in facilitating sound pedagogical practices, there are models proposed for learning science, technology and other KLAs such as Bloom’s Taxonomy (Bloom, 1956), Productive Pedagogies (Education Queensland, 2004), de Bono’s Six Hats (de Bono, 1985), and Gardner’s Multiple Intelligences (Gardner, 1999) that imply, warrant, or necessitate fused curricula. Bybee’s 5 Es, for example, has five levels of learning (engage, explore, explain, elaborate, and evaluate; Bybee, 1997) can have the potential for fusing science and ICT standards.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The travel and hospitality industry is one which relies especially crucially on word of mouth, both at the level of overall destinations (Australia, Queensland, Brisbane) and at the level of travellers’ individual choices of hotels, restaurants, sights during their trips. The provision of such word-of-mouth information has been revolutionised over the past decade by the rise of community-based Websites which allow their users to share information about their past and future trips and advise one another on what to do or what to avoid during their travels. Indeed, the impact of such user-generated reviews, ratings, and recommendations sites has been such that established commercial travel advisory publishers such as Lonely Planet have experienced a pronounced downturn in sales ¬– unless they have managed to develop their own ways of incorporating user feedback and contributions into their publications. This report examines the overall significance of ratings and recommendation sites to the travel industry, and explores the community, structural, and business models of a selection of relevant ratings and recommendations sites. We identify a range of approaches which are appropriate to the respective target markets and business aims of these organisations, and conclude that there remain significant opportunities for further operators especially if they aim to cater for communities which are not yet appropriately served by specific existing sites. Additionally, we also point to the increasing importance of connecting stand-alone ratings and recommendations sites with general social media spaces like Facebook, Twitter, and LinkedIn, and of providing mobile interfaces which enable users to provide updates and ratings directly from the locations they happen to be visiting. In this report, we profile the following sites: * TripAdvisor, the international market leader for travel ratings and recommendations sites, with a membership of some 11 million users; * IgoUgo, the other leading site in this field, which aims to distinguish itself from the market leader by emphasising the quality of its content; * Zagat, a long-established publisher of restaurant guides which has translated its crowdsourcing model from the offline to the online world; * Lonely Planet’s Thorn Tree site, which attempts to respond to the rise of these travel communities by similarly harnessing user-generated content; * Stayz, which attempts to enhance its accommodation search and booking services by incorporating ratings and reviews functionality; and * BigVillage, an Australian-based site attempting to cater for a particularly discerning niche of travellers; * Dopplr, which connects travel and social networking in a bid to pursue the lucrative market of frequent and business travellers; * Foursquare, which builds on its mobile application to generate a steady stream of ‘check-ins’ and recommendations for hospitality and other services around the world; * Suite 101, which uses a revenue-sharing model to encourage freelance writers to contribute travel writing (amongst other genres of writing); * Yelp, the global leader in general user-generated product review and recommendation services. In combination, these profiles provide an overview of current developments in the travel ratings and recommendations space (and beyond), and offer an outlook for further possibilities. While no doubt affected by the global financial downturn and the reduction in travel that it has caused, travel ratings and recommendations remain important – perhaps even more so if a reduction in disposable income has resulted in consumers becoming more critical and discerning. The aggregated word of mouth from many tens of thousands of travellers which these sites provide certainly has a substantial influence on their users. Using these sites to research travel options has now become an activity which has spread well beyond the digirati. The same is true also for many other consumer industries, especially where there is a significant variety of different products available – and so, this report may also be read as a case study whose findings are able to be translated, mutatis mutandis, to purchasing decisions from household goods through consumer electronics to automobiles.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

For the first time in human history, large volumes of spoken audio are being broadcast, made available on the internet, archived, and monitored for surveillance every day. New technologies are urgently required to unlock these vast and powerful stores of information. Spoken Term Detection (STD) systems provide access to speech collections by detecting individual occurrences of specified search terms. The aim of this work is to develop improved STD solutions based on phonetic indexing. In particular, this work aims to develop phonetic STD systems for applications that require open-vocabulary search, fast indexing and search speeds, and accurate term detection. Within this scope, novel contributions are made within two research themes, that is, accommodating phone recognition errors and, secondly, modelling uncertainty with probabilistic scores. A state-of-the-art Dynamic Match Lattice Spotting (DMLS) system is used to address the problem of accommodating phone recognition errors with approximate phone sequence matching. Extensive experimentation on the use of DMLS is carried out and a number of novel enhancements are developed that provide for faster indexing, faster search, and improved accuracy. Firstly, a novel comparison of methods for deriving a phone error cost model is presented to improve STD accuracy, resulting in up to a 33% improvement in the Figure of Merit. A method is also presented for drastically increasing the speed of DMLS search by at least an order of magnitude with no loss in search accuracy. An investigation is then presented of the effects of increasing indexing speed for DMLS, by using simpler modelling during phone decoding, with results highlighting the trade-off between indexing speed, search speed and search accuracy. The Figure of Merit is further improved by up to 25% using a novel proposal to utilise word-level language modelling during DMLS indexing. Analysis shows that this use of language modelling can, however, be unhelpful or even disadvantageous for terms with a very low language model probability. The DMLS approach to STD involves generating an index of phone sequences using phone recognition. An alternative approach to phonetic STD is also investigated that instead indexes probabilistic acoustic scores in the form of a posterior-feature matrix. A state-of-the-art system is described and its use for STD is explored through several experiments on spontaneous conversational telephone speech. A novel technique and framework is proposed for discriminatively training such a system to directly maximise the Figure of Merit. This results in a 13% improvement in the Figure of Merit on held-out data. The framework is also found to be particularly useful for index compression in conjunction with the proposed optimisation technique, providing for a substantial index compression factor in addition to an overall gain in the Figure of Merit. These contributions significantly advance the state-of-the-art in phonetic STD, by improving the utility of such systems in a wide range of applications.