994 resultados para Speech interaction
Resumo:
Keyword Spotting is the task of detecting keywords of interest within continu- ous speech. The applications of this technology range from call centre dialogue systems to covert speech surveillance devices. Keyword spotting is particularly well suited to data mining tasks such as real-time keyword monitoring and unre- stricted vocabulary audio document indexing. However, to date, many keyword spotting approaches have su®ered from poor detection rates, high false alarm rates, or slow execution times, thus reducing their commercial viability. This work investigates the application of keyword spotting to data mining tasks. The thesis makes a number of major contributions to the ¯eld of keyword spotting. The ¯rst major contribution is the development of a novel keyword veri¯cation method named Cohort Word Veri¯cation. This method combines high level lin- guistic information with cohort-based veri¯cation techniques to obtain dramatic improvements in veri¯cation performance, in particular for the problematic short duration target word class. The second major contribution is the development of a novel audio document indexing technique named Dynamic Match Lattice Spotting. This technique aug- ments lattice-based audio indexing principles with dynamic sequence matching techniques to provide robustness to erroneous lattice realisations. The resulting algorithm obtains signi¯cant improvement in detection rate over lattice-based audio document indexing while still maintaining extremely fast search speeds. The third major contribution is the study of multiple veri¯er fusion for the task of keyword veri¯cation. The reported experiments demonstrate that substantial improvements in veri¯cation performance can be obtained through the fusion of multiple keyword veri¯ers. The research focuses on combinations of speech background model based veri¯ers and cohort word veri¯ers. The ¯nal major contribution is a comprehensive study of the e®ects of limited training data for keyword spotting. This study is performed with consideration as to how these e®ects impact the immediate development and deployment of speech technologies for non-English languages.
Resumo:
Automatic spoken Language Identi¯cation (LID) is the process of identifying the language spoken within an utterance. The challenge that this task presents is that no prior information is available indicating the content of the utterance or the identity of the speaker. The trend of globalization and the pervasive popularity of the Internet will amplify the need for the capabilities spoken language identi¯ca- tion systems provide. A prominent application arises in call centers dealing with speakers speaking di®erent languages. Another important application is to index or search huge speech data archives and corpora that contain multiple languages. The aim of this research is to develop techniques targeted at producing a fast and more accurate automatic spoken LID system compared to the previous National Institute of Standards and Technology (NIST) Language Recognition Evaluation. Acoustic and phonetic speech information are targeted as the most suitable fea- tures for representing the characteristics of a language. To model the acoustic speech features a Gaussian Mixture Model based approach is employed. Pho- netic speech information is extracted using existing speech recognition technol- ogy. Various techniques to improve LID accuracy are also studied. One approach examined is the employment of Vocal Tract Length Normalization to reduce the speech variation caused by di®erent speakers. A linear data fusion technique is adopted to combine the various aspects of information extracted from speech. As a result of this research, a LID system was implemented and presented for evaluation in the 2003 Language Recognition Evaluation conducted by the NIST.
Resumo:
In this paper, a static synchronous series compensator (SSSC), along with a fixed capacitor, is used to avoid torsional mode instability in a series compensated transmission system. A 48-step harmonic neutralized inverter is used for the realization of the SSSC. The system under consideration is the IEEE first benchmark model on SSR analysis. The system stability is studied both through eigenvalue analysis and EMTDC/PSCAD simulation studies. It is shown that the combination of the SSSC and the fixed capacitor improves the synchronizing power coefficient. The presence of the fixed capacitor ensures increased damping of small signal oscillations. At higher levels of fixed capacitor compensation, a damping controller is required to stabilize the torsional modes of SSR.
Resumo:
The effective atomic number is widely employed in radiation studies, particularly for the characterisation of interaction processes in dosimeters, biological tissues and substitute materials. Gel dosimeters are unique in that they comprise both the phantom and dosimeter material. In this work, effective atomic numbers for total and partial electron interaction processes have been calculated for the first time for a Fricke gel dosimeter, five hypoxic and nine normoxic polymer gel dosimeters. A range of biological materials are also presented for comparison. The spectrum of energies studied spans 10 keV to 100 MeV, over which the effective atomic number varies by 30 %. The effective atomic numbers of gels match those of soft tissue closely over the full energy range studied; greater disparities exist at higher energies but are typically within 4 %.
Resumo:
In this paper we propose a new method for utilising phase information by complementing it with traditional magnitude-only spectral subtraction speech enhancement through Complex Spectrum Subtraction (CSS). The proposed approach has the following advantages over traditional magnitude-only spectral subtraction: (a) it introduces complementary information to the enhancement algorithm; (b) it reduces the total number of algorithmic parameters, and; (c) is designed for improving clean speech magnitude spectra and is therefore suitable for both automatic speech recognition (ASR) and speech perception applications. Oracle-based ASR experiments verify this approach, showing an average of 20% relative word accuracy improvements when accurate estimates of the phase spectrum are available. Based on sinusoidal analysis and assuming stationarity between observations (which is shown to be better approximated as the frame rate is increased), this paper also proposes a novel method for acquiring the phase information called Phase Estimation via Delay Projection (PEDEP). Further oracle ASR experiments validate the potential for the proposed PEDEP technique in ideal conditions. Realistic implementation of CSS with PEDEP shows performance comparable to state of the art spectral subtraction techniques in a range of 15-20 dB signal-to-noise ratio environments. These results clearly demonstrate the potential for using phase spectra in spectral subtractive enhancement applications, and at the same time highlight the need for deriving more accurate phase estimates in a wider range of noise conditions.
Resumo:
This workshop explores innovative approaches to understanding and cultivating sustainable food culture in urban environments via human-computer-interaction (HCI) design and ubiquitous technologies. We perceive the city as an intersecting network of people, place, and technology in constant transformation. Our 2009 OZCHI workshop, Hungry 24/7? HCI Design for Sustainable Food Culture, opened a new space for discussion on this intersection amongst researchers and practitioners from diverse backgrounds including academia, government, industry, and non-for-profit organisations. Building on the past success, this new instalment of the workshop series takes a more refined view on mobile human-food interaction and the role of interactive media in engaging citizens to cultivate more sustainable everyday human-food interactions on the go. Interactive media in this sense is distributed, pervasive, and embedded in the city as a network. The workshop addresses environmental, health, and social domains of sustainability by bringing together insights across disciplines to discuss conceptual and design approaches in orchestrating mobility and interaction of people and food in the city as a network of people, place, technology, and food.
Resumo:
To understand human behavior, it is important to know under what conditions people deviate from selfish rationality. This study explores the interaction of natural survival instincts and internalized social norms using data on the sinking of the Titanic and the Lusitania. We show that time pressure appears to be crucial when explaining behavior under extreme conditions of life and death. Even though the two vessels and the composition of their passengers were quite similar, the behavior of the individuals on board was dramatically different. On the Lusitania, selfish behavior dominated (which corresponds to the classical homo oeconomicus); on the Titanic, social norms and social status (class) dominated, which contradicts standard economics. This difference could be attributed to the fact that the Lusitania sank in 18 minutes, creating a situation in which the short-run flight impulse dominates behavior. On the slowly sinking Titanic (2 hours, 40 minutes), there was time for socially determined behavioral patterns to re-emerge. To our knowledge, this is the first time that these shipping disasters have been analyzed in a comparative manner with advanced statistical (econometric) techniques using individual data of the passengers and crew. Knowing human behavior under extreme conditions allows us to gain insights about how varied human behavior can be depending on differing external conditions.
Resumo:
The integration of computer technologies into everyday classroom life continues to provide pedagogical challenges for school systems, teachers and administrators. Data from an exploratory case study of one teacher and a multiage class of children in the first years of schooling in Australia show that when young children are using computers for set tasks in small groups, they require ongoing support from teachers, and to engage in peer interactions that are meaningful and productive. Classroom organization and the nature of teacher-child talk are key factors in engaging children in set tasks and producing desirable learning and teaching outcomes.