942 resultados para speaker linking


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Speaker diarization determines instances of the same speaker within a recording. Extending this task to a collection of recordings for linking together segments spoken by a unique speaker requires speaker linking. In this paper we propose a speaker linking system using linkage clustering and state-of-the-art speaker recognition techniques. We evaluate our approach against two baseline linking systems using agglomerative cluster merging (AC) and agglomerative clustering with model retraining (ACR). We demonstrate that our linking method, using complete-linkage clustering, provides a relative improvement of 20% and 29% in attribution error rate (AER), over the AC and ACR systems, respectively.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This research makes a major contribution which enables efficient searching and indexing of large archives of spoken audio based on speaker identity. It introduces a novel technique dubbed as “speaker attribution” which is the task of automatically determining ‘who spoke when?’ in recordings and then automatically linking the unique speaker identities within each recording across multiple recordings. The outcome of the research will also have significant impact in improving the performance of automatic speech recognition systems through the extracted speaker identities.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

In this paper we propose and evaluate a speaker attribution system using a complete-linkage clustering method. Speaker attribution refers to the annotation of a collection of spoken audio based on speaker identities. This can be achieved using diarization and speaker linking. The main challenge associated with attribution is achieving computational efficiency when dealing with large audio archives. Traditional agglomerative clustering methods with model merging and retraining are not feasible for this purpose. This has motivated the use of linkage clustering methods without retraining. We first propose a diarization system using complete-linkage clustering and show that it outperforms traditional agglomerative and single-linkage clustering based diarization systems with a relative improvement of 40% and 68%, respectively. We then propose a complete-linkage speaker linking system to achieve attribution and demonstrate a 26% relative improvement in attribution error rate (AER) over the single-linkage speaker linking approach.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Speaker attribution is the task of annotating a spoken audio archive based on speaker identities. This can be achieved using speaker diarization and speaker linking. In our previous work, we proposed an efficient attribution system, using complete-linkage clustering, for conducting attribution of large sets of two-speaker telephone data. In this paper, we build on our proposed approach to achieve a robust system, applicable to multiple recording domains. To do this, we first extend the diarization module of our system to accommodate multi-speaker (>2) recordings. We achieve this through using a robust cross-likelihood ratio (CLR) threshold stopping criterion for clustering, as opposed to the original stopping criterion of two speakers used for telephone data. We evaluate this baseline diarization module across a dataset of Australian broadcast news recordings, showing a significant lack of diarization accuracy without previous knowledge of the true number of speakers within a recording. We thus propose applying an additional pass of complete-linkage clustering to the diarization module, demonstrating an absolute improvement of 20% in diarization error rate (DER). We then evaluate our proposed multi-domain attribution system across the broadcast news data, demonstrating achievable attribution error rates (AER) as low as 17%.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

In this paper we propose a novel scheme for carrying out speaker diarization in an iterative manner. We aim to show that the information obtained through the first pass of speaker diarization can be reused to refine and improve the original diarization results. We call this technique speaker rediarization and demonstrate the practical application of our rediarization algorithm using a large archive of two-speaker telephone conversation recordings. We use the NIST 2008 SRE summed telephone corpora for evaluating our speaker rediarization system. This corpus contains recurring speaker identities across independent recording sessions that need to be linked across the entire corpus. We show that our speaker rediarization scheme can take advantage of inter-session speaker information, linked in the initial diarization pass, to achieve a 30% relative improvement over the original diarization error rate (DER) after only two iterations of rediarization.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

We present a novel method for improving hierarchical speaker clustering in the tasks of speaker diarization and speaker linking. In hierarchical clustering, a tree can be formed that demonstrates various levels of clustering. We propose a ratio that expresses the impact of each cluster on the formation of this tree and use this to rescale cluster scores. This provides score normalisation based on the impact of each cluster. We use a state-of-the-art speaker diarization and linking system across the SAIVT-BNEWS corpus to show that our proposed impact ratio can provide a relative improvement of 16% in diarization error rate (DER).

Relevância:

40.00% 40.00%

Publicador:

Resumo:

We present a clustering-only approach to the problem of speaker diarization to eliminate the need for the commonly employed and computationally expensive Viterbi segmentation and realignment stage. We use multiple linear segmentations of a recording and carry out complete-linkage clustering within each segmentation scenario to obtain a set of clustering decisions for each case. We then collect all clustering decisions, across all cases, to compute a pairwise vote between the segments and conduct complete-linkage clustering to cluster them at a resolution equal to the minimum segment length used in the linear segmentations. We use our proposed cluster-voting approach to carry out speaker diarization and linking across the SAIVT-BNEWS corpus of Australian broadcast news data. We compare our technique to an equivalent baseline system with Viterbi realignment and show that our approach can outperform the baseline technique with respect to the diarization error rate (DER) and attribution error rate (AER).

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Purpose: In the present study, we consider mechanical properties of phosphate glasses under high temperatureinduced and under friction-induced cross-linking, which enhance the modulus of elasticity. Design/methodology/approach: Two nanomechanical properties are evaluated, the first parameter is the modulus of elasticity (E) (or Young's modulus) and the second parameter is the hardness (H). Zinc meta-, pyro - and orthophosphates were recognized as amorphous-colloidal nanoparticles were synthesized under laboratory conditions and showed antiwear properties in engine oil. Findings: Young's modulus of the phosphate glasses formed under high temperature was in the 60-89 GPa range. For phosphate tribofilm formed under friction hardness and the Young's modulus were in the range of 2-10 GPa and 40-215 GPa, respectively. The degree of cross-linking during friction is provided by internal pressure of about 600 MPa and temperature close to 1000°C enhancing mechanical properties by factor of 3 (see Fig 1). Research limitations/implications: The addition of iron or aluminum ions to phosphate glasses under high temperature - and friction-induced amorphization of zinc metaphosphate and pyrophosphate tends to provide more cross-linking and mechanically stronger structures. Iron and aluminum (FeO4 or AlO4 units), incorporated into phosphate structure as network formers, contribute to the anion network bonding by converting the P=O bonds into bridging oxygen. Future work should consider on development of new of materials prepared by solgel processes, eg., zinc (II)-silicic acid. Originality/value: This paper analyses the friction pressure-induced and temperature–induced the two factors lead phosphate tribofilm glasses to chemically advanced glass structures, which may enhance the wear inhibition. Adding the coordinating ions alters the pressure at which cross-linking occurs and increases the antiwear properties of the surface material significantly.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The effectiveness of higher-order spectral (HOS) phase features in speaker recognition is investigated by comparison with Mel Cepstral features on the same speech data. HOS phase features retain phase information from the Fourier spectrum unlikeMel–frequency Cepstral coefficients (MFCC). Gaussian mixture models are constructed from Mel– Cepstral features and HOS features, respectively, for the same data from various speakers in the Switchboard telephone Speech Corpus. Feature clusters, model parameters and classification performance are analyzed. HOS phase features on their own provide a correct identification rate of about 97% on the chosen subset of the corpus. This is the same level of accuracy as provided by MFCCs. Cluster plots and model parameters are compared to show that HOS phase features can provide complementary information to better discriminate between speakers.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The issue of ‘rigour vs. relevance’ in IS research has generated an intense, heated debate for over a decade. It is possible to identify, however, only a limited number of contributions on how to increase the relevance of IS research without compromising its rigour. Based on a lifecycle view of IS research, we propose the notion of ‘reality checks’ in order to review IS research outcomes in the light of actual industry demands. We assume that five barriers impact the efficient transfer of IS research outcomes; they are lack of awareness, lack of understandability, lack of relevance, lack of timeliness, and lack of applicability. In seeking to understand the effect of these barriers on the transfer of mature IS research into practice, we used focus groups. We chose DeLone and McLean’s IS success model as our stimulus because it is one of the more widely researched areas of IS.