Biblioteca Digital

932 resultados para Speech and pioneering sports Colima

Eigenvoice modeling for cross likelihood ratio based speaker clustering : a Bayesian approach

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper proposes the use of Bayesian approaches with the cross likelihood ratio (CLR) as a criterion for speaker clustering within a speaker diarization system, using eigenvoice modeling techniques. The CLR has previously been shown to be an effective decision criterion for speaker clustering using Gaussian mixture models. Recently, eigenvoice modeling has become an increasingly popular technique, due to its ability to adequately represent a speaker based on sparse training data, as well as to provide an improved capture of differences in speaker characteristics. The integration of eigenvoice modeling into the CLR framework to capitalize on the advantage of both techniques has also been shown to be beneficial for the speaker clustering task. Building on that success, this paper proposes the use of Bayesian methods to compute the conditional probabilities in computing the CLR, thus effectively combining the eigenvoice-CLR framework with the advantages of a Bayesian approach to the diarization problem. Results obtained on the 2002 Rich Transcription (RT-02) Evaluation dataset show an improved clustering performance, resulting in a 33.5% relative improvement in the overall Diarization Error Rate (DER) compared to the baseline system.

Speaker attribution of multiple telephone conversations using a complete-linkage clustering approach

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper we propose and evaluate a speaker attribution system using a complete-linkage clustering method. Speaker attribution refers to the annotation of a collection of spoken audio based on speaker identities. This can be achieved using diarization and speaker linking. The main challenge associated with attribution is achieving computational efficiency when dealing with large audio archives. Traditional agglomerative clustering methods with model merging and retraining are not feasible for this purpose. This has motivated the use of linkage clustering methods without retraining. We first propose a diarization system using complete-linkage clustering and show that it outperforms traditional agglomerative and single-linkage clustering based diarization systems with a relative improvement of 40% and 68%, respectively. We then propose a complete-linkage speaker linking system to achieve attribution and demonstrate a 26% relative improvement in attribution error rate (AER) over the single-linkage speaker linking approach.

I-vector based speaker recognition using advanced channel compensation techniques

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper investigates advanced channel compensation techniques for the purpose of improving i-vector speaker veriﬁcation performance in the presence of high intersession variability using the NIST 2008 and 2010 SRE corpora. The performance of four channel compensation techniques: (a) weighted maximum margin criterion (WMMC), (b) source-normalized WMMC (SN-WMMC), (c) weighted linear discriminant analysis (WLDA), and; (d) source-normalized WLDA (SN-WLDA) have been investigated. We show that, by extracting the discriminatory information between pairs of speakers as well as capturing the source variation information in the development i-vector space, the SN-WLDA based cosine similarity scoring (CSS) i-vector system is shown to provide over 20% improvement in EER for NIST 2008 interview and microphone veriﬁcation and over 10% improvement in EER for NIST 2008 telephone veriﬁcation, when compared to SN-LDA based CSS i-vector system. Further, score-level fusion techniques are analyzed to combine the best channel compensation approaches, to provide over 8% improvement in DCF over the best single approach, (SN-WLDA), for NIST 2008 interview/ telephone enrolment-veriﬁcation condition. Finally, we demonstrate that the improvements found in the context of CSS also generalize to state-of-the-art GPLDA with up to 14% relative improvement in EER for NIST SRE 2010 interview and microphone veriﬁcation and over 7% relative improvement in EER for NIST SRE 2010 telephone veriﬁcation.

Speaker diarization : "who spoke when"

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Speaker diarization is the process of annotating an input audio with information that attributes temporal regions of the audio signal to their respective sources, which may include both speech and non-speech events. For speech regions, the diarization system also specifies the locations of speaker boundaries and assign relative speaker labels to each homogeneous segment of speech. In short, speaker diarization systems effectively answer the question of ‘who spoke when’. There are several important applications for speaker diarization technology, such as facilitating speaker indexing systems to allow users to directly access the relevant segments of interest within a given audio, and assisting with other downstream processes such as summarizing and parsing. When combined with automatic speech recognition (ASR) systems, the metadata extracted from a speaker diarization system can provide complementary information for ASR transcripts including the location of speaker turns and relative speaker segment labels, making the transcripts more readable. Speaker diarization output can also be used to localize the instances of specific speakers to pool data for model adaptation, which in turn boosts transcription accuracies. Speaker diarization therefore plays an important role as a preliminary step in automatic transcription of audio data. The aim of this work is to improve the usefulness and practicality of speaker diarization technology, through the reduction of diarization error rates. In particular, this research is focused on the segmentation and clustering stages within a diarization system. Although particular emphasis is placed on the broadcast news audio domain and systems developed throughout this work are also trained and tested on broadcast news data, the techniques proposed in this dissertation are also applicable to other domains including telephone conversations and meetings audio. Three main research themes were pursued: heuristic rules for speaker segmentation, modelling uncertainty in speaker model estimates, and modelling uncertainty in eigenvoice speaker modelling. The use of heuristic approaches for the speaker segmentation task was first investigated, with emphasis placed on minimizing missed boundary detections. A set of heuristic rules was proposed, to govern the detection and heuristic selection of candidate speaker segment boundaries. A second pass, using the same heuristic algorithm with a smaller window, was also proposed with the aim of improving detection of boundaries around short speaker segments. Compared to single threshold based methods, the proposed heuristic approach was shown to provide improved segmentation performance, leading to a reduction in the overall diarization error rate. Methods to model the uncertainty in speaker model estimates were developed, to address the difficulties associated with making segmentation and clustering decisions with limited data in the speaker segments. The Bayes factor, derived specifically for multivariate Gaussian speaker modelling, was introduced to account for the uncertainty of the speaker model estimates. The use of the Bayes factor also enabled the incorporation of prior information regarding the audio to aid segmentation and clustering decisions. The idea of modelling uncertainty in speaker model estimates was also extended to the eigenvoice speaker modelling framework for the speaker clustering task. Building on the application of Bayesian approaches to the speaker diarization problem, the proposed approach takes into account the uncertainty associated with the explicit estimation of the speaker factors. The proposed decision criteria, based on Bayesian theory, was shown to generally outperform their non- Bayesian counterparts.

Improving PLDA speaker verification with limited development data

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper analyses the probabilistic linear discriminant analysis (PLDA) speaker verification approach with limited development data. This paper investigates the use of the median as the central tendency of a speaker’s i-vector representation, and the effectiveness of weighted discriminative techniques on the performance of state-of-the-art length-normalised Gaussian PLDA (GPLDA) speaker verification systems. The analysis within shows that the median (using a median fisher discriminator (MFD)) provides a better representation of a speaker when the number of representative i-vectors available during development is reduced, and that further, usage of the pair-wise weighting approach in weighted LDA and weighted MFD provides further improvement in limited development conditions. Best performance is obtained using a weighted MFD approach, which shows over 10% improvement in EER over the baseline GPLDA system on mismatched and interview-interview conditions.

Do the kinematics of a baulked take-off in springboard diving differ from those of a completed dive

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Consistency and invariance in movements are traditionally viewed as essential features of skill acquisition and elite sports performance. This emphasis on the stabilization of action has resulted in important processes of adaptation in movement coordination during performance being overlooked in investigations of elite sport performance. Here we investigate whether differences exist between the movement kinematics displayed by five, elite springboard divers (age 17 ± 2.4 years) in the preparation phases of baulked and completed take-offs. The two-dimensional kinematic characteristics of the reverse somersault take-off phases (approach and hurdle) were recorded during normal training sessions and used for intra-individual analysis. All participants displayed observable differences in movement patterns at key events during the approach phase; however, the presence of similar global topological characteristics suggested that, overall, participants did not perform distinctly different movement patterns during completed and baulked dives. These findings provide a powerful rationale for coaches to consider assessing functional variability or adaptability of motor behaviour as a key criterion of successful performance in sports such as diving.

Co-thought gestures : supporting students to successfully navigate map tasks

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This study considers the role and nature of co-thought gestures when students process map-based mathematics tasks. These gestures are typically spontaneously produced silent gestures which do not accompany speech and are represented by small movements of the hands or arms often directed toward an artefact. The study analysed 43 students (aged 10–12 years) over a 3-year period as they solved map tasks that required spatial reasoning. The map tasks were representative of those typically found in mathematics classrooms for this age group and required route finding and coordinate knowledge. The results indicated that co-thought gestures were used to navigate the problem space and monitor movements within the spatial challenges of the respective map tasks. Gesturing was most influential when students encountered unfamiliar tasks or when they found the tasks spatially demanding. From a teaching and learning perspective, explicit co-thought gesturing highlights cognitive challenges students are experiencing since students tended to not use gesturing in tasks where the spatial demands were low.

Detecting rare events using Kullback-Leibler divergence

Relevância:

100.00% 100.00%

Publicador:

Resumo:

One main challenge in developing a system for visual surveillance event detection is the annotation of target events in the training data. By making use of the assumption that events with security interest are often rare compared to regular behaviours, this paper presents a novel approach by using Kullback-Leibler (KL) divergence for rare event detection in a weakly supervised learning setting, where only clip-level annotation is available. It will be shown that this approach outperforms state-of-the-art methods on a popular real-world dataset, while preserving real time performance.

On the use of speaker superfactors for speaker recognition

Relevância:

100.00% 100.00%

Publicador:

Searching for semantic person queries using channel representations

Relevância:

100.00% 100.00%

Publicador:

Resumo:

It is not uncommon to hear a person of interest described by their height, build, and clothing (i.e. type and colour). These semantic descriptions are commonly used by people to describe others, as they are quick to relate and easy to understand. However such queries are not easily utilised within intelligent surveillance systems as they are difficult to transform into a representation that can be searched for automatically in large camera networks. In this paper we propose a novel approach that transforms such a semantic query into an avatar that is searchable within a video stream, and demonstrate state-of-the-art performance for locating a subject in video based on a description.

Access to the Internet Submission to ALRC IP 46

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This submission is directed to issues arising in respect of the need to recognise and support access to the internet for all Australian residents and citizens. As such it addresses the following questions only: Questions 2-1: What general principles or criteria should be applied to help determine whether a law that interferes with freedom of speech is justified? Question 2-2: Which Commonwealth laws unjustifiably interfere with freedom of speech, and why are these laws unjustified?

Drug administration guides in dysphagia

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Aim: The aim of this evaluation was to evaluate the use of Individualised Medication Administration Guides (IMAGs) for patients with dysphagia on one stroke ward over a 6month period. Background: Patients with dysphagia (PWD) are more likely to suffer an administration error than patients without swallowing difficulties. To both standardise and improve medicines administration to patients with dysphagia I-MAGs were introduced on one stroke ward over a 6 month period. Methods: A software package supported with data on current national guidelines on the administration of medicines to PWD was designed by a specialised pharmacist in dysphagia to enable him to create individualised medication administration guides for patients with dysphagia which stated how each medicine should be optimally prepared and administered. On completion of the pilot service a questionnaire was given to all nurses, pharmacist and speech and language therapists who had experienced the I-MAGs. All the professionals received the same questionnaire but questions relevant only to their practice were added to the nurse’s questionnaire. Results: Of 26 Healthcare professionals (HCPs) approached, 19 returned completed questionnaires. Higher variability was found in the 13 responses from the nurse respondents than in the ones from the 3 pharmacist and the 3 SALTs. 8 (61%) of the nurses felt more confident in their practice when I-MAGs were in place. 10 (76%) of the nurses admitted that the guides could sometimes increase the time of the administration, but saw that it made practice safer. All the pharmacists considered the recommendations in the guides useful and all the respondents with the exception of one nurse (12:13) would like this service to continue. Conclusion: I-MAGs were well received on the ward and they support individualised care for patients with dysphagia. But the guides needed additional pharmacist input and greater nursing time. Research to determine the cost effectiveness of I-MAGs is needed.

Variable phenotype in 16p duplication within a family

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background More individuals are now being identified with very rare genetic syndromes. We present a family with an inherited duplication of 16p11.2 to 16q12.1 in ring formation. Three of the four children, (aged 15 months to 10 years), mother, uncle, and grandmother are affected. Our aim was to provide preliminary evidence of possible phenotypic patterns of learning and behaviour associated with this chromosome anomaly. Method Psychometric assessments were undertaken with all four children. The mother and uncle also agreed to participate in the study. Measures of development (Bayley or Mullen), intellectual ability (WISC-IV or WAIS-III), academic achievement (WIAT-II), adaptive behaviour (Vinelands), and other relevant aspects of functioning (e.g., Children’s Memory Scale) were administered. Results. The first-born child is the only one who is unaffected. Her intellectual ability was assessed as being within the superior range. The second child experienced early difficulties with speech and motor skills. Although his intelligence is average, he has learning difficulties and significant auditory memory problems. The third child’s speech and motor milestones were markedly delayed. He has a complex medical history that includes a vitamin B12 deficiency. On the Mullen Scales at age 4 his scores ranged from average to very low. The development of the youngest child (aged 15 months), who also had a B12 deficiency but was treated early, was assessed as being within typical limits. Conclusions There is considerable developmental variability among the three children with this inherited 16p duplication. We discuss the intriguing similarities and differences, considering common features that may reflect phenotypic patterns and speculating about possible explanations for the variable presentations.

Improving out-domain PLDA speaker verification using unsupervised inter-dataset variability compensation approach

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Experimental studies have found that when the state-of-the-art probabilistic linear discriminant analysis (PLDA) speaker verification systems are trained using out-domain data, it significantly affects speaker verification performance due to the mismatch between development data and evaluation data. To overcome this problem we propose a novel unsupervised inter dataset variability (IDV) compensation approach to compensate the dataset mismatch. IDV-compensated PLDA system achieves over 10% relative improvement in EER values over out-domain PLDA system by effectively compensating the mismatch between in-domain and out-domain data.

Left versus right hemisphere differences in brain connectivity: 4-Tesla HARDI tractography in 569 twins

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Diffusion imaging can map anatomical connectivity in the living brain, offering new insights into fundamental questions such as how the left and right brain hemispheres differ. Anatomical brain asymmetries are related to speech and language abilities, but less is known about left/right hemisphere differences in brain wiring. To assess this, we scanned 457 young adults (age 23.4±2.0 SD years) and 112 adolescents (age 12-16) with 4-Tesla 105-gradient high-angular resolution diffusion imaging. We extracted fiber tracts throughout the brain with a Hough transform method. A 70×70 connectivity matrix was created, for each subject, based on the proportion of fibers intersecting 70 cortical regions. We identified significant differences in the proportions of fibers intersecting left and right hemisphere cortical regions. The degree of asymmetry in the connectivity matrices varied with age, as did the asymmetry in network topology measures such as the small-world effect.

«
1
2
...
55
56
57
58
59
60
61
62
63
»