994 resultados para complete-linkage


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Speaker diarization determines instances of the same speaker within a recording. Extending this task to a collection of recordings for linking together segments spoken by a unique speaker requires speaker linking. In this paper we propose a speaker linking system using linkage clustering and state-of-the-art speaker recognition techniques. We evaluate our approach against two baseline linking systems using agglomerative cluster merging (AC) and agglomerative clustering with model retraining (ACR). We demonstrate that our linking method, using complete-linkage clustering, provides a relative improvement of 20% and 29% in attribution error rate (AER), over the AC and ACR systems, respectively.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper we propose and evaluate a speaker attribution system using a complete-linkage clustering method. Speaker attribution refers to the annotation of a collection of spoken audio based on speaker identities. This can be achieved using diarization and speaker linking. The main challenge associated with attribution is achieving computational efficiency when dealing with large audio archives. Traditional agglomerative clustering methods with model merging and retraining are not feasible for this purpose. This has motivated the use of linkage clustering methods without retraining. We first propose a diarization system using complete-linkage clustering and show that it outperforms traditional agglomerative and single-linkage clustering based diarization systems with a relative improvement of 40% and 68%, respectively. We then propose a complete-linkage speaker linking system to achieve attribution and demonstrate a 26% relative improvement in attribution error rate (AER) over the single-linkage speaker linking approach.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We propose a novel technique for conducting robust voice activity detection (VAD) in high-noise recordings. We use Gaussian mixture modeling (GMM) to train two generic models; speech and non-speech. We then score smaller segments of a given (unseen) recording against each of these GMMs to obtain two respective likelihood scores for each segment. These scores are used to compute a dissimilarity measure between pairs of segments and to carry out complete-linkage clustering of the segments into speech and non-speech clusters. We compare the accuracy of our method against state-of-the-art and standardised VAD techniques to demonstrate an absolute improvement of 15% in half-total error rate (HTER) over the best performing baseline system and across the QUT-NOISE-TIMIT database. We then apply our approach to the Audio-Visual Database of American English (AVDBAE) to demonstrate the performance of our algorithm in using visual, audio-visual or a proposed fusion of these features.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In this paper we extend the concept of speaker annotation within a single-recording, or speaker diarization, to a collection wide approach we call speaker attribution. Accordingly, speaker attribution is the task of clustering expectantly homogenous intersession clusters obtained using diarization according to common cross-recording identities. The result of attribution is a collection of spoken audio across multiple recordings attributed to speaker identities. In this paper, an attribution system is proposed using mean-only MAP adaptation of a combined-gender UBM to model clusters from a perfect diarization system, as well as a JFA-based system with session variability compensation. The normalized cross-likelihood ratio is calculated for each pair of clusters to construct an attribution matrix and the complete linkage algorithm is employed to conduct clustering of the inter-session clusters. A matched cluster purity and coverage of 87.1% was obtained on the NIST 2008 SRE corpus.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This research makes a major contribution which enables efficient searching and indexing of large archives of spoken audio based on speaker identity. It introduces a novel technique dubbed as “speaker attribution” which is the task of automatically determining ‘who spoke when?’ in recordings and then automatically linking the unique speaker identities within each recording across multiple recordings. The outcome of the research will also have significant impact in improving the performance of automatic speech recognition systems through the extracted speaker identities.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Speaker attribution is the task of annotating a spoken audio archive based on speaker identities. This can be achieved using speaker diarization and speaker linking. In our previous work, we proposed an efficient attribution system, using complete-linkage clustering, for conducting attribution of large sets of two-speaker telephone data. In this paper, we build on our proposed approach to achieve a robust system, applicable to multiple recording domains. To do this, we first extend the diarization module of our system to accommodate multi-speaker (>2) recordings. We achieve this through using a robust cross-likelihood ratio (CLR) threshold stopping criterion for clustering, as opposed to the original stopping criterion of two speakers used for telephone data. We evaluate this baseline diarization module across a dataset of Australian broadcast news recordings, showing a significant lack of diarization accuracy without previous knowledge of the true number of speakers within a recording. We thus propose applying an additional pass of complete-linkage clustering to the diarization module, demonstrating an absolute improvement of 20% in diarization error rate (DER). We then evaluate our proposed multi-domain attribution system across the broadcast news data, demonstrating achievable attribution error rates (AER) as low as 17%.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In this paper we propose a novel scheme for carrying out speaker diarization in an iterative manner. We aim to show that the information obtained through the first pass of speaker diarization can be reused to refine and improve the original diarization results. We call this technique speaker rediarization and demonstrate the practical application of our rediarization algorithm using a large archive of two-speaker telephone conversation recordings. We use the NIST 2008 SRE summed telephone corpora for evaluating our speaker rediarization system. This corpus contains recurring speaker identities across independent recording sessions that need to be linked across the entire corpus. We show that our speaker rediarization scheme can take advantage of inter-session speaker information, linked in the initial diarization pass, to achieve a 30% relative improvement over the original diarization error rate (DER) after only two iterations of rediarization.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Objective: To follow-up previous studies highlighting a possible role for cytochrome P450, family 2, subfamily C, 19 (CYP2C19) in susceptibility to endometriosis by searching for additional variants in the CYP2C19 gene that may be associated with the disease. Design Case-control study. Setting Academic research. Subject(s) The cases comprised 2,271 women with surgically confirmed endometriosis; the controls comprised 939 women with self-report of no endometriosis and 1,770 unscreened population samples. Intervention(s) Sequencing of the CYP2C19 region and follow-up of 80 single nucleotide polymorphisms (SNPs) in two case-control samples. Main Outcome Measure(s) Allele frequency differences between cases and controls. Result(s) Sequencing of the CYP2C19 gene region resulted in the detection of a large number of known and novel SNPs. Genotyping of 80 polymorphic SNPs in 901 endometriosis cases and 939 controls resulted in study-wide significant association signals for SNPs in moderate or complete linkage disequilibrium with rs4244285, a functional SNP in exon 5 that abrogates CYP2C19 function through the creation of an alternative splice site. Evidence of association was also detected for another functional SNP in the CYP2C19 promoter, rs12248560, which was highlighted in our previous study. Conclusion(s) Functional variants in CYP2C19 may contribute to endometriosis susceptibility in both familial and sporadic cases. © 2014 by American Society for Reproductive Medicine.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We present a clustering-only approach to the problem of speaker diarization to eliminate the need for the commonly employed and computationally expensive Viterbi segmentation and realignment stage. We use multiple linear segmentations of a recording and carry out complete-linkage clustering within each segmentation scenario to obtain a set of clustering decisions for each case. We then collect all clustering decisions, across all cases, to compute a pairwise vote between the segments and conduct complete-linkage clustering to cluster them at a resolution equal to the minimum segment length used in the linear segmentations. We use our proposed cluster-voting approach to carry out speaker diarization and linking across the SAIVT-BNEWS corpus of Australian broadcast news data. We compare our technique to an equivalent baseline system with Viterbi realignment and show that our approach can outperform the baseline technique with respect to the diarization error rate (DER) and attribution error rate (AER).

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Recurring water stresses are a major risk factor for rainfed maize cropping across the highly diverse agro-ecological environments of Queensland (Qld) and northern New South Wales (NNSW). Enhanced understanding of such agro-ecological diversity is necessary to more consistently sample target production environments for testing and targeting release of improved germplasm, and to improve the efficiency of the maize pre-breeding and breeding programs of Qld and New South Wales. Here, we used the Agricultural Production Systems Simulator (APSIM) – a well validated maize crop model to characterize the key distinctive water stress patterns and risk to production across the main maize growing regions of Qld and NNSW located between 15.8° and 31.5°S, and 144.5° and 151.8°E. APSIM was configured to simulate daily water supply demand ratios (SDRs) around anthesis as an indicator of the degree of water stress, and the final grain yield. Simulations were performed using daily climatic records during the period between 1890 and 2010 for 32 sites-soils in the target production regions. The runs were made assuming adequate nitrogen supply for mid-season maize hybrid Pioneer 3153. Hierarchical complete linkage analyses of the simulated yield resulted in five major clusters showing distinct probability distribution of the expected yields and geographic patterns. The drought stress patterns and their frequencies using SDRs were quantified using multivariate statistical methods. The identified stress patterns included no stress, mid-season (flowering) stress, and three terminal stresses differing in terms of severity. The combined frequency of flowering and terminal stresses was highest (82.9%), mainly in sites-soils combinations in the west of Qld and NNSW. Yield variability across the different sites-soils was significantly related to the variability in frequencies of water stresses. Frequencies of water stresses within each yield cluster tended to be similar, but different across clusters. Sites-soils falling within each yield cluster therefore could be treated as distinct maize production environments for testing and targeting newly developed maize cultivars and hybrids for adaptation to water stress patterns most common to those environments.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

There is substantial evidence for a susceptibility gene for late-onset Alzheimer's disease (AD) on chromosome 10. One of the characteristic features of AD is the degeneration and dysfunction of the cholinergic system. The genes encoding choline acetyltransferase (ChAT) and its vesicular transporter (VAChT), CHAT and SLC18A3 respectively, map to the linked region of chromosome 10 and are therefore both positional and obvious functional candidate genes for late-onset AD. We have screened both genes for sequence variants and investigated each for association with late-onset AD in up to 500 late-onset AD cases and 500 control DNAs collected in the UK. We detected a total of 17 sequence variants. Of these, 14 were in CHAT, comprising three non-synonymous variants (D7N in the S exon, A120T in exon 5 and L243F in exon 8), one synonymous change (H547H), nine single-nucleotide polymorphisms in intronic, untranslated or promoter regions, and a variable number of tandem repeats in intron 7. Three non-coding SNPs were detected in SLC18A3. None demonstrated any reproducible association with late-onset AD in our samples. Levels of linkage disequilibrium were generally low across the CHAT locus but two of the coding variants, D7N and A120T, proved to be in complete linkage disequilibrium.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Where there is genetically based variation in selfishness and altruism, as in man, altruists with an innate ability to recognise and thereby only help their altruistic relatives may evolve. Here we use diploid population genetic models to chart the evolution of genetically-based discrimination in populations initially in stable equilibrium between altruism and selfishness. The initial stable equilibria occur because help is assumed subject to diminishing returns. Similar results were obtained whether we used a model with two independently inherited loci, one controlling altruism the other discrimination, or a one locus model with three alleles. The latter is the opposite extreme to the first model, and can be thought of as involving complete linkage between two loci on the same chromosome. The introduction of discrimination reduced the benefits obtained by selfish individuals, more so as the number of discriminators increased, and selfishness was eventually eliminated in some cases. In others selfishness persisted and the evolutionary outcome was a stable equilibrium involving selfish individuals and both discriminating and non-discriminating altruists. Heritable variation in selfishness, altruism and discrimination is predicted to be particularly evident among full sibs. The suggested coexistence of these three genetic dispositions could explain widespread interest within human social groups as to who will and who will not help others. These predictions merit experimental and observational investigation by primatologists, anthropologists and psychologists. Keywords: Population genetics, Diploid, Heritability, Prosocial, Behaviour genetics

Relevância:

60.00% 60.00%

Publicador:

Resumo:

O objetivo deste trabalho foi comparar diferentes técnicas multivariadas na caracterização de 35 genótipos de gergelim mediante 769 marcadores RAPD. As distâncias genéticas foram obtidas pelo complemento aritmético do coeficiente de Jaccard e agrupadas pelos métodos hierárquicos do vizinho mais próximo, do vizinho mais distante, das médias aritméticas não ponderadas (UPGMA), do método de otimização de Tocher e análises de coordenadas principais. O agrupamento dos genótipos foi alterado em função dos diferentes métodos usados. Adotando-se a mesma distância genética (0,36) como valor de corte, diferenciaram-se quatro grupos no método do vizinho mais próximo, 13 para o vizinho mais distante, 11 no UPGMA e quatro no Tocher. Entre os métodos hierárquicos, o UPGMA apresentou o melhor ajuste das distâncias originais e estimadas (CCC = 0,89). As análises das coordenadas principais confirmaram a baixa diversidade existente entre os genótipos. A maior divergência ocorreu entre as cultivares Seridó 1 e Arawaca 4, e a menor, entre os genótipos VCR-101 e GP-3314. As três primeiras coordenadas principais contabilizaram 35,13% do total da variabilidade, e 18 autovalores foram necessários para explicar 81% da variação genética. Os métodos UPGMA, de otimização de Tocher, e as análises de coordenadas principais são complementares na formação dos grupos.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Com o objetivo de verificar a existência de variabilidade temporal e espacial do tamanho de amostra da temperatura mínima do ar média mensal de trinta e sete municípios do Rio Grande do Sul, utilizaram-se os dados de temperatura mínima do ar do período de 1931 a 2000. Determinou-se o tamanho de amostra da temperatura mínima do ar média mensal em cada mês e município. Realizou-se análise de agrupamento dos meses e dos municípios pelo método hierárquico vizinho mais distante. Há variabilidade do tamanho de amostra (número de anos) para a estimativa da temperatura mínima do ar média mensal no Estado do Rio Grande do Sul no tempo e no espaço. Maior tamanho de amostra, no Estado do Rio Grande do Sul, é necessário nos meses de maio, junho e julho, com diminuição gradativa em direção a janeiro e dezembro. Há variabilidade do tamanho de amostra entre os municípios do Estado do Rio Grande do Sul.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)