189 resultados para Speech articulation tests
Resumo:
Keyword Spotting is the task of detecting keywords of interest within continu- ous speech. The applications of this technology range from call centre dialogue systems to covert speech surveillance devices. Keyword spotting is particularly well suited to data mining tasks such as real-time keyword monitoring and unre- stricted vocabulary audio document indexing. However, to date, many keyword spotting approaches have su®ered from poor detection rates, high false alarm rates, or slow execution times, thus reducing their commercial viability. This work investigates the application of keyword spotting to data mining tasks. The thesis makes a number of major contributions to the ¯eld of keyword spotting. The ¯rst major contribution is the development of a novel keyword veri¯cation method named Cohort Word Veri¯cation. This method combines high level lin- guistic information with cohort-based veri¯cation techniques to obtain dramatic improvements in veri¯cation performance, in particular for the problematic short duration target word class. The second major contribution is the development of a novel audio document indexing technique named Dynamic Match Lattice Spotting. This technique aug- ments lattice-based audio indexing principles with dynamic sequence matching techniques to provide robustness to erroneous lattice realisations. The resulting algorithm obtains signi¯cant improvement in detection rate over lattice-based audio document indexing while still maintaining extremely fast search speeds. The third major contribution is the study of multiple veri¯er fusion for the task of keyword veri¯cation. The reported experiments demonstrate that substantial improvements in veri¯cation performance can be obtained through the fusion of multiple keyword veri¯ers. The research focuses on combinations of speech background model based veri¯ers and cohort word veri¯ers. The ¯nal major contribution is a comprehensive study of the e®ects of limited training data for keyword spotting. This study is performed with consideration as to how these e®ects impact the immediate development and deployment of speech technologies for non-English languages.
Resumo:
Automatic spoken Language Identi¯cation (LID) is the process of identifying the language spoken within an utterance. The challenge that this task presents is that no prior information is available indicating the content of the utterance or the identity of the speaker. The trend of globalization and the pervasive popularity of the Internet will amplify the need for the capabilities spoken language identi¯ca- tion systems provide. A prominent application arises in call centers dealing with speakers speaking di®erent languages. Another important application is to index or search huge speech data archives and corpora that contain multiple languages. The aim of this research is to develop techniques targeted at producing a fast and more accurate automatic spoken LID system compared to the previous National Institute of Standards and Technology (NIST) Language Recognition Evaluation. Acoustic and phonetic speech information are targeted as the most suitable fea- tures for representing the characteristics of a language. To model the acoustic speech features a Gaussian Mixture Model based approach is employed. Pho- netic speech information is extracted using existing speech recognition technol- ogy. Various techniques to improve LID accuracy are also studied. One approach examined is the employment of Vocal Tract Length Normalization to reduce the speech variation caused by di®erent speakers. A linear data fusion technique is adopted to combine the various aspects of information extracted from speech. As a result of this research, a LID system was implemented and presented for evaluation in the 2003 Language Recognition Evaluation conducted by the NIST.
Resumo:
Most statistical methods use hypothesis testing. Analysis of variance, regression, discrete choice models, contingency tables, and other analysis methods commonly used in transportation research share hypothesis testing as the means of making inferences about the population of interest. Despite the fact that hypothesis testing has been a cornerstone of empirical research for many years, various aspects of hypothesis tests commonly are incorrectly applied, misinterpreted, and ignored—by novices and expert researchers alike. On initial glance, hypothesis testing appears straightforward: develop the null and alternative hypotheses, compute the test statistic to compare to a standard distribution, estimate the probability of rejecting the null hypothesis, and then make claims about the importance of the finding. This is an oversimplification of the process of hypothesis testing. Hypothesis testing as applied in empirical research is examined here. The reader is assumed to have a basic knowledge of the role of hypothesis testing in various statistical methods. Through the use of an example, the mechanics of hypothesis testing is first reviewed. Then, five precautions surrounding the use and interpretation of hypothesis tests are developed; examples of each are provided to demonstrate how errors are made, and solutions are identified so similar errors can be avoided. Remedies are provided for common errors, and conclusions are drawn on how to use the results of this paper to improve the conduct of empirical research in transportation.
Resumo:
In this paper we propose a new method for utilising phase information by complementing it with traditional magnitude-only spectral subtraction speech enhancement through Complex Spectrum Subtraction (CSS). The proposed approach has the following advantages over traditional magnitude-only spectral subtraction: (a) it introduces complementary information to the enhancement algorithm; (b) it reduces the total number of algorithmic parameters, and; (c) is designed for improving clean speech magnitude spectra and is therefore suitable for both automatic speech recognition (ASR) and speech perception applications. Oracle-based ASR experiments verify this approach, showing an average of 20% relative word accuracy improvements when accurate estimates of the phase spectrum are available. Based on sinusoidal analysis and assuming stationarity between observations (which is shown to be better approximated as the frame rate is increased), this paper also proposes a novel method for acquiring the phase information called Phase Estimation via Delay Projection (PEDEP). Further oracle ASR experiments validate the potential for the proposed PEDEP technique in ideal conditions. Realistic implementation of CSS with PEDEP shows performance comparable to state of the art spectral subtraction techniques in a range of 15-20 dB signal-to-noise ratio environments. These results clearly demonstrate the potential for using phase spectra in spectral subtractive enhancement applications, and at the same time highlight the need for deriving more accurate phase estimates in a wider range of noise conditions.
Resumo:
In this paper, we present a microphone array beamforming approach to blind speech separation. Unlike previous beamforming approaches, our system does not require a-priori knowledge of the microphone placement and speaker location, making the system directly comparable other blind source separation methods which require no prior knowledge of recording conditions. Microphone location is automatically estimated using an assumed noise field model, and speaker locations are estimated using cross correlation based methods. The system is evaluated on the data provided for the PASCAL Speech Separation Challenge 2 (SSC2), achieving a word error rate of 58% on the evaluation set.
Resumo:
Interacting with technology within a vehicle environment using a voice interface can greatly reduce the effects of driver distraction. Most current approaches to this problem only utilise the audio signal, making them susceptible to acoustic noise. An obvious approach to circumvent this is to use the visual modality in addition. However, capturing, storing and distributing audio-visual data in a vehicle environment is very costly and difficult. One current dataset available for such research is the AVICAR [1] database. Unfortunately this database is largely unusable due to timing mismatch between the two streams and in addition, no protocol is available. We have overcome this problem by re-synchronising the streams on the phone-number portion of the dataset and established a protocol for further research. This paper presents the first audio-visual results on this dataset for speaker-independent speech recognition. We hope this will serve as a catalyst for future research in this area.
Resumo:
This paper presents the details of experimental studies on the shear strength of a recently developed, cold-formed steel beam known as LiteSteel Beam (LSB) with web openings. The innovative LSB sections have the beneficial characteristics of torsionally rigid closed rectangular flanges combined with economical fabrication processes from a single strip of high strength steel. They combine the stability of hot-rolled steel sections with the high strength to weight ratio of conventional cold-formed steel sections. The LSB sections are commonly used as flexural members in the building industry. Current practice in flooring systems is to include openings in the web element of floor joists or bearers so that building services can be located within them. Shear behaviour of LSBs with web openings is more complicated while their shear strengths are considerably reduced by the presence of web openings. However, limited research has been undertaken on the shear behaviour and strength of LSBs with web openings. Therefore a detailed experimental study involving 26 shear tests was undertaken to investigate the shear behaviour and strength of different LSB sections. Simply supported test specimens of LSBs with an aspect ratio of 1.5 were loaded at midspan until failure. This paper presents the details of this experimental study and the results. Experimental results showed that the current design rules in cold-formed steel structures design codes (AS/NZS 4600) [1] are very conservative for the shear design of LSBs with web openings. Improved design equations have been proposed for the shear strength of LSBs with web openings based on experimental results from this study.
Resumo:
Focusing on the use of language is a crucial strategy in good mathematics teaching and a teacher’s guidance can assist students to master the language of mathematics. This article discusses the statements with reference to recent year 7 and 9 NAPLAN numeracy tests. It draws the readers’ attention to the complexities of language in the field of mathematics. Although this article refers to NAPLAN numeracy tests it also offers advice about good teaching practice.
Resumo:
This study of working-class and middle-class youth theatre workshops examines the processes through which this cultural form is appropriated by different class groups. Whereas the middle-class workshop proceeded efficiently and harmoniously, the working-class group resisted a number of institutional constraints traditionally associated with play rehearsal and performance. The processes of such symbolic struggle in the working-class group appeared to differ from Bourdieu's account of cultural domination. The article explores the explanatory contribution of the ethnographic case study to the analysis of the class basis of cultural tastes and practices and suggest that Bourdieu's account of class relations would gain from inclusion of this level of analysis. The situated study of the youth theatre workshops suggests that at this level, there is possibly more scope for symbolic struggle between the classes than was found by Bourdieu.
Resumo:
The LiteSteel beam (LSB) is a new hollow flange channel section developed by OneSteel Australian Tube Mills using their patented dual electric resistance welding and automated continuous roll-forming process. It has a unique geometry consisting of torsionally rigid rectangular hollow flanges and a relatively slender web. The LSBs are commonly used as flexural members in buildings. However, the LSB flexural members are subjected to lateral distortional buckling, which reduces their member moment capacities. Unlike the commonly observed lateral torsional buckling of steel beams, the lateral distortional buckling of LSBs is characterised by simultaneous lateral deflection, twist, and cross sectional change due to web distortion. An experimental study including more than 50 lateral buckling tests was therefore conducted to investigate the behaviour and strength of LSB flexural members. It included the available 13 LSB sections with spans ranging from 1200 to 4000 mm. Lateral buckling tests based on a quarter point loading were conducted using a special test rig designed to simulate the required simply supported and loading conditions accurately. Experimental moment capacities were compared with the predictions from the design rules in the Australian cold-formed steel structures standard. The new design rules in the standard were able to predict the moment capacities more accurately than previous design rules. This paper presents the details of lateral distortional buckling tests, in particular the features of the lateral buckling test rig, the results and the comparisons. It also includes the results of detailed studies into the mechanical properties and residual stresses of LSBs.
Resumo:
This thesis develops, applies and analyses a collaborative design methodology for branding a tourism destination. The area between the Northern Tablelands and the Mid-North Coast of New South Wales, Australia, was used as a case study for this research. The study applies theoretical concepts of systems thinking and complexity to the real world, and tests the use of design as a social tool to engage multiple stakeholders in planning. In this research I acknowledge that places (and destinations) are socially constructed through people's interactions with their physical and social environments. This study explores a methodology that is explicit about the uncertainties of the destination’s system, and that helps to elicit knowledge and system trends. The collective design process used the creation of brand concepts, elements and strategies as instruments to directly engage stakeholders in the process of reflecting about their places and the issues related to tourism activity in the region. The methods applied included individual conversations and collaborative design sessions to elicit knowledge from local stakeholders. Concept maps were used to register and interpret information released throughout the process. An important aspect of the methodology was to bring together different stakeholder groups and translate the information into a common language that was understandable by all participants. This work helped release significant information as to what kind of tourism activity local stakeholders are prepared to receive and support. It also helped the emergence of a more unified regional identity. The outcomes delivered by the project (brand, communication material and strategies) were of high quality and in line with the desires and expectation of the local hosts. The process also reinforced local sense of pride, belonging and conservation. Furthermore, interaction between participants from different parts of the region triggered some self organising activity around the brand they created together. A major contribution of the present work is the articulation of an inclusive methodology to facilitate the involvement of locals into the decision-making process related to tourism planning. Of particular significance is the focus on the social construction of meaning in and through design, showing that design exercises can have significant social impact – not only on the final product, but also on the realities of the people involved in the creative process.
Resumo:
Corporate sponsorship of events contributes significantly to marketing aims, including brand awareness as measured by recall and recognition of sponsor‐event pairings. Unfortunately, resultant advantages accrue disproportionately to brands having a natural or congruent fit with the available sponsorship properties. In three cued‐recall experiments, the effect of articulation of sponsorship fit on memory for sponsor‐event pairings is examined. While congruent sponsors have a natural memory advantage, results demonstrate that memory improvements via articulation are possible for incongruent sponsor‐event pairings. These improvements are, however, affected by the presence of competitor brands and the way in which memory is accessed.