818 resultados para speaker diarization
Resumo:
In this paper, we presents HyperSausage Neuron based on the High-Dimension Space(HDS), and proposes a new algorithm for speaker independent continuous digit speech recognition. At last, compared to HMM-based method, the recognition rate of HyperSausage Neuron method is higher than that of in HMM-based method.
Resumo:
As Levelt and Meyer (2000) noted, because studies of lexical access during multiword utterances production such as phrases and sentences, they raise two novel questions which studies of single word production do not. Firstly, does the access of different words in a sentence occur in a parallel or a serial fashion? Secondly, does the access of the different words in a sentence occur in an interactive or a discrete fashion? The latter question concerns the horizontal information flow (Smith & Wheeldon, 2004), which is a very important aspect of continuous speech production. A variant of the picture–word interference paradigm combining with eye-tracking technique and a dual task paradigm was used in 7 experiments to investigate the horizontal information flow of semantic and phonological information between nouns in spoken Mandarin Chinese sentences. The results suggested that: 1. Before speech onset, semantic information of different words accross the whole sentence has been activated, while phonological activation has been limited within the first phrase of the sentence. 2. Before speech onset, speaker will look ahead and check the semantic information of latter words as the first noun is beening processed, such looking ahead for phonological information can just occur within the first phrase of the sentence. 3. After speech onset, speaker will concentrate on the content words beyond the first one and will check the semantic information of other words with the same sentence. 4. The result suggested that the lexical accesses of multiple words during spoken sentence production are processed in a partly serial and partly parallel manner and stands for the Unit-by-Unit and Incremental view proposed by Levelt (2000). 5. The horizontal information flow during spoken sentence production is not an automatic process and is constrained by cognitive resource.
Resumo:
Voice alarm plays an important role in emergency evacuation of public place, because it can provide information and instruct evacuation. This paper studied the optimization of acoustic and semantic parameters of voice alarms in emergency evacuation, so that alarm design can improve the evacuation performance. Both method of magnitude estimation and scale were implemented to investigate participants' perceived urgency of the alarms with different parameters. The results indicated that, participants evaluated the alarms with faster speech rate, with greater signal to noise ratio (SNR) and under louder noises more urgent. There was an interaction between noise level and content of voice alarm. Signals with speech rate below 4 characters / second were evaluated as non urgent at all. Intelligibility of the voice alarm was investigated by evaluating the key pointed recognition performance. The results showed that, speech rate’s effect was a marginal significance, and 7 characters / second has the highest intelligibility. It might because that the faster the signal spoken, the more attention was paid. Gender of speaker and SNR did not have a significant effect on the signals’ intelligibility. This paper also investigated impact of voice alarms' content on human behavior in emergency evacuation in a 3-D virtual reality environment. In condition of "telling the occupants what had happened and what to do", the number of participants who succeeded in evacuation was the largest. Further study, in which similar numbers of participants evacuate successfully in three conditions, indicated that the reaction time and evacuation time was the shortest in the aforesaid condition. Although one-way ANOVA shows that the difference was not significant, the results still provided some reference to the alarm design. In sum, parameters of voice alarm in emergency evacuation should be chosen to meet needs from both perceived urgency and intelligibility. Contents of the alarms should include "what had happened and what to do", and should vary according to noise levels in different public places.
Resumo:
The primary goal of this report is to demonstrate how considerations from computational complexity theory can inform grammatical theorizing. To this end, generalized phrase structure grammar (GPSG) linguistic theory is revised so that its power more closely matches the limited ability of an ideal speaker--hearer: GPSG Recognition is EXP-POLY time hard, while Revised GPSG Recognition is NP-complete. A second goal is to provide a theoretical framework within which to better understand the wide range of existing GPSG models, embodied in formal definitions as well as in implemented computer programs. A grammar for English and an informal explanation of the GPSG/RGPSG syntactic features are included in appendices.
Resumo:
This report investigates the process of focussing as a description and explanation of the comprehension of certain anaphoric expressions in English discourse. The investigation centers on the interpretation of definite anaphora, that is, on the personal pronouns, and noun phrases used with a definite article the, this or that. Focussing is formalized as a process in which a speaker centers attention on a particular aspect of the discourse. An algorithmic description specifies what the speaker can focus on and how the speaker may change the focus of the discourse as the discourse unfolds. The algorithm allows for a simple focussing mechanism to be constructed: and element in focus, an ordered collection of alternate foci, and a stack of old foci. The data structure for the element in focus is a representation which encodes a limted set of associations between it and other elements from teh discourse as well as from general knowledge.
Resumo:
An understanding of research is important to enable nurses to provide evidencebasedcare. However, undergraduate nursing students often find research a challenging subject. The purpose of this paper is to present an evaluation of the introduction of podcasts in an undergraduate research module to enhance research teaching linkages between the theoretical content and research in practice and improve the level of student support offered in a blended learning environment. Two cohorts of students (n=228 and n=233) were given access to a series of 5 “guest speaker” podcasts made up of presentations and interviews with research experts within Edinburgh Napier. These staff would not normally have contact with students on this module, but through the podcasts were able to share their research expertise and methods with our learners. The main positive results of the podcasts suggest the increased understanding achieved by students due to the multi-modal delivery approach, a more personal student/tutor relationship leading to greater engagement, and the effective use of materials for revision and consolidation purposes. Negative effects of the podcasts centred around problems with the technology, most often difficulty in downloading and accessing the material. This paper contributes to the emerging knowledge base of podcasting in nurse education by demonstrating how podcasts can be used to enhance research-teaching linkages and raises the question of why students do not exploit the opportunities for mobile learning.
Resumo:
Wydział Historyczny: Instytut Etnologii i Antropologii Kulturowej
Resumo:
Existing work in Computer Science and Electronic Engineering demonstrates that Digital Signal Processing techniques can effectively identify the presence of stress in the speech signal. These techniques use datasets containing real or actual stress samples i.e. real-life stress such as 911 calls and so on. Studies that use simulated or laboratory-induced stress have been less successful and inconsistent. Pervasive, ubiquitous computing is increasingly moving towards voice-activated and voice-controlled systems and devices. Speech recognition and speaker identification algorithms will have to improve and take emotional speech into account. Modelling the influence of stress on speech and voice is of interest to researchers from many different disciplines including security, telecommunications, psychology, speech science, forensics and Human Computer Interaction (HCI). The aim of this work is to assess the impact of moderate stress on the speech signal. In order to do this, a dataset of laboratory-induced stress is required. While attempting to build this dataset it became apparent that reliably inducing measurable stress in a controlled environment, when speech is a requirement, is a challenging task. This work focuses on the use of a variety of stressors to elicit a stress response during tasks that involve speech content. Biosignal analysis (commercial Brain Computer Interfaces, eye tracking and skin resistance) is used to verify and quantify the stress response, if any. This thesis explains the basis of the author’s hypotheses on the elicitation of affectively-toned speech and presents the results of several studies carried out throughout the PhD research period. These results show that the elicitation of stress, particularly the induction of affectively-toned speech, is not a simple matter and that many modulating factors influence the stress response process. A model is proposed to reflect the author’s hypothesis on the emotional response pathways relating to the elicitation of stress with a required speech content. Finally the author provides guidelines and recommendations for future research on speech under stress. Further research paths are identified and a roadmap for future research in this area is defined.
Resumo:
This paper explores the transnational and interstitial dimensions of cultural production in Britain today, and the representation of migrant and diasporic identities in contemporary mainstream British cinema. The box office success of films like Gurindha Chadha’s Bhaji on the Beach (1993) and Bend it Like Beckham (2002) and East is East (Daniel O’Donnell 1999) and their precursors My Beautiful Launderette (Stephen Frears 1985), Sammy and Rosie Get Laid (Stephen Frears 1987) and the TV mini-series Buddha of Suburbia (Roger Mitchell 1993) seem to celebrate and articulate a set of values around hybridity and alterity: a discourse of multiculturalism. This paper will engage with a series of key questions. Are there ideological values implicit within and common to all these texts? Can we map a rhetoric or discourse of multiculturalism within popular culture? Do mainstream representations of immigrant identities represent a discourse of resistance, a decolonising global culture or is this Western brand of multiculturalism still located within an Orientalising gaze? In what ways are multiculturalism and postcolonialism overlapping and yet opposing rhetorics? [From the Author]
Resumo:
The Twentieth Century Society’s Spring lecture series (six in total) looks at the restoration and refurbishment of key C20 buildings in Britain and the US. Buildings covered: BBC Broadcasting House in London (G Val Meyer 1930-32, MacCormac Jamieson Prichard 2000-09). Speaker: Mark Hines (Mark Hines Architects), was the project architect and is the author of The Story of Broadcasting House: Home of the BBC. 5 February 2009. Crown Hall, Chicago (Mies van der Rohe 1952), the Art and Architecture Building, Yale University, New Haven (Paul Rudolf 1961-63) and the former Wills head office in Bristol (SOM with YRM 1970-75). Speaker: Patrick Bellew (Atelier 10 Engineers), 12 February 2009. Center for British Art, Yale University, New Haven (Louis Kahn 1969-77). Speaker: Peter Inskip (Inskip and Jenkins Architects), 17 February 2009. Brunswick Centre London (Patrick Hodgkinson 1967-72; Levitt Bernstein with Patrick Hodgkinson 2006). Speaker: Stuart Tappin (Stand Consulting Engineers Ltd), 26 February 2009. De La Warr Pavilion, Bexhill-on-Sea (Mendelsohn and Chermayeff 1934-5, John McAslan and Partners 2000-05). Speaker: Mark Cannata (HOK Architects), 5 March 2009. Finsbury Health Centre London (Lubetkin & Tecton 1938, first phase of conservation work Avanti Architects 1995.). Speaker: John Allan of Avanti Architects, 12 March 2009.
Resumo:
We present results of a study into the performance of a variety of different image transform-based feature types for speaker-independent visual speech recognition of isolated digits. This includes the first reported use of features extracted using a discrete curvelet transform. The study will show a comparison of some methods for selecting features of each feature type and show the relative benefits of both static and dynamic visual features. The performance of the features will be tested on both clean video data and also video data corrupted in a variety of ways to assess each feature type's robustness to potential real-world conditions. One of the test conditions involves a novel form of video corruption we call jitter which simulates camera and/or head movement during recording.
Resumo:
In this paper we present the application of Hidden Conditional Random Fields (HCRFs) to modelling speech for visual speech recognition. HCRFs may be easily adapted to model long range dependencies across an observation sequence. As a result visual word recognition performance can be improved as the model is able to take more of a contextual approach to generating state sequences. Results are presented from a speaker-dependent, isolated digit, visual speech recognition task using comparisons with a baseline HMM system. We firstly illustrate that word recognition rates on clean video using HCRFs can be improved by increasing the number of past and future observations being taken into account by each state. Secondly we compare model performances using various levels of video compression on the test set. As far as we are aware this is the first attempted use of HCRFs for visual speech recognition.
Resumo:
In sexually selected signals, distinct components often have specific signal value in mate choice or male-male competition. In songbirds, structural song traits such as trills, that is, a series of repetitive notes, can be important in female choice. However, little is known about their signal value in male-male interactions. Here, we investigated the hypothesis that males assess the competitive abilities of rivals based on the use and performance of rapid broadband trills produced within songs. Using a 2-speaker playback experiment, we exposed territorial male nightingales, Luscinia megarhynchos, that differed in their subsequent pairing success, to a simulated vocal interaction between 2 unfamiliar rivals. The singing of the 2 simulated rivals differed in the number of songs containing rapid broadband trills. Subjects responded significantly more strongly to the loudspeaker that broadcast songs containing such trills than to the loudspeaker that broadcast exclusively songs without such trills. Moreover, responses also depended on the fine structure of trills. Males that became paired later in the season significantly increased their response intensity with increasing trill performance, whereas males that remained unpaired responded in the opposite way and decreased their response intensity with increasing trill performance. These results indicate that rapid broadband trills are a signal of aggression and that the nature of the response in vocal interactions reflects aspects of the challenged male's fitness. © The Author 2008. Published by Oxford University Press on behalf of the International Society for Behavioral Ecology. All rights reserved.
--------------------------------------------------------------------------------
Reaxys Database Information|
--------------------------------------------------------------------------------
Resumo:
Effects of vowel variation on interaction are considered, with particular relevance to their role in conversational breakdown. The effect of speaker knowledge and experience is noted as a variable in developmental progress which must inform profiling decisions, and the need for appropriate taxonomies of speech varieties is emphasized as a precursor to clinical and educational assessments. It is noted, too, that a shared sociolinguistic background between speaker and listener does not always resolve difficulties arising from non-target realizations, casting some doubt on ideas that assessors always possess a guaranteed sense of phonological variability and its effects. Hence, an informed understanding of phonological variation, rather than merely awareness that such variation exists, is advocated.