942 resultados para speaker linking


Relevância:

20.00% 20.00%

Publicador:

Resumo:

For several reasons, the Fourier phase domain is less favored than the magnitude domain in signal processing and modeling of speech. To correctly analyze the phase, several factors must be considered and compensated, including the effect of the step size, windowing function and other processing parameters. Building on a review of these factors, this paper investigates a spectral representation based on the Instantaneous Frequency Deviation, but in which the step size between processing frames is used in calculating phase changes, rather than the traditional single sample interval. Reflecting these longer intervals, the term delta-phase spectrum is used to distinguish this from instantaneous derivatives. Experiments show that mel-frequency cepstral coefficients features derived from the delta-phase spectrum (termed Mel-Frequency delta-phase features) can produce broadly similar performance to equivalent magnitude domain features for both voice activity detection and speaker recognition tasks. Further, it is shown that the fusion of the magnitude and phase representations yields performance benefits over either in isolation.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This document outlines the system submitted by the Speech and Audio Research Laboratory at the Queensland University of Technology (QUT) for the Speaker Identity Verification: Application task of EVALITA 2009. This competitive submission consisted of a score-level fusion of three component systems; a joint-factor analysis GMM system and two SVM systems using GLDS and GMM supervector kernels. Development evaluation and post-submission results are presented in this study, demonstrating the effectiveness of this fused system approach. This study highlights the challenges associated with system calibration from limited development data and that mismatch between training and testing conditions continues to be a major source of error in speaker verification technology.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper proposes the use of the Bayes Factor as a distance metric for speaker segmentation within a speaker diarization system. The proposed approach uses a pair of constant sized, sliding windows to compute the value of the Bayes Factor between the adjacent windows over the entire audio. Results obtained on the 2002 Rich Transcription Evaluation dataset show an improved segmentation performance compared to previous approaches reported in literature using the Generalized Likelihood Ratio. When applied in a speaker diarization system, this approach results in a 5.1% relative improvement in the overall Diarization Error Rate compared to the baseline.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Background: The 2003 Bureau of Labor Statistics American Time Use Survey (ATUS) contains 438 distinct primary activity variables that can be analyzed with regard to how time is spent by Americans. The Compendium of Physical Activities is used to code physical activities derived from various surveys, logs, diaries, etc to facilitate comparison of coded intensity levels across studies. ------ ----- Methods: This paper describes the methods, challenges, and rationale for linking Compendium estimates of physical activity intensity (METs, metabolic equivalents) with all activities reported in the 2003 ATUS. ----- ----- Results: The assigned ATUS intensity levels are not intended to compute the energy costs of physical activity in individuals. Instead, they are intended to be used to identify time spent in activities broadly classified by type and intensity. This function will complement public health surveillance systems and aid in policy and health-promotion activities. For example, at least one of the future projects of this process is the descriptive epidemiology of time spent in common physical activity intensity categories. ----- ----- Conclusions: The process of metabolic coding of the ATUS by linking it with the Compendium of Physical Activities can make important contributions to our understanding of Americans’ time spent in health-related physical activity.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Objective: to assess the accuracy of data linkage across the spectrum of emergency care in the absence of a unique patient identifier, and to use the linked data to examine service delivery outcomes in an emergency department setting. Design: automated data linkage and manual data linkage were compared to determine their relative accuracy. Data were extracted from three separate health information systems: ambulance, ED and hospital inpatients, then linked to provide information about the emergency journey of each patient. The linking was done manually through physical review of records and automatically using a data linking tool (Health Data Integration) developed by the CSIRO. Match rate and quality of the linking were compared. Setting: 10, 835 patient presentations to a large, regional teaching hospital ED over a two month period (August-September 2007). Results: comparison of the manual and automated linkage outcomes for each pair of linked datasets demonstrated a sensitivity of between 95% and 99%; a specificity of between 75% and 99%; and a positive predictive value of between 88% and 95%. Conclusions: Our results indicate that automated linking provides a sound basis for health service analysis, even in the absence of a unique patient identifier. The use of an automated linking tool yields accurate data suitable for planning and service delivery purposes and enables the data to be linked regularly to examine service delivery outcomes.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The individual-opportunity nexus emphasizes that both the characteristics of individuals and venture ideas have roles in the entrepreneurial process (Shane & Venkataraman, 2000). Following upon this assertion the present study examined whether the venture idea novelty and investment of resources can make an important part in the venture creation process. Data analysed for a sample of nascent entrepreneurs in Australia suggests that the novelty of venture ideas restricts the performance of nascent ventures. However, the more investment of time and money do not show a significant impact to the venture performance.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A vast proportion of companies nowadays are looking to design and are focusing on the end users as a means of driving new projects. However still many companies are drawn to technological improvements which drive innovation within their industry context. The Australian livestock industry is no different. To date the adoption of new products and services within the livestock industry has been documented as being quite slow. This paper investigates how disruptive innovation should be a priority for these technologically focused companies and demonstrates how the use of design led innovation can bring about a higher quality engagement between end user and company alike. A case study linking participatory design and design thinking is presented. Within this, a conceptual model of presenting future scenarios to internal and external stakeholders is applied to the livestock industry; assisting companies to apply strategy, culture and advancement in meaningful product offerings to consumers.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper we extend the concept of speaker annotation within a single-recording, or speaker diarization, to a collection wide approach we call speaker attribution. Accordingly, speaker attribution is the task of clustering expectantly homogenous intersession clusters obtained using diarization according to common cross-recording identities. The result of attribution is a collection of spoken audio across multiple recordings attributed to speaker identities. In this paper, an attribution system is proposed using mean-only MAP adaptation of a combined-gender UBM to model clusters from a perfect diarization system, as well as a JFA-based system with session variability compensation. The normalized cross-likelihood ratio is calculated for each pair of clusters to construct an attribution matrix and the complete linkage algorithm is employed to conduct clustering of the inter-session clusters. A matched cluster purity and coverage of 87.1% was obtained on the NIST 2008 SRE corpus.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Gaussian mixture models (GMMs) have become an established means of modeling feature distributions in speaker recognition systems. It is useful for experimentation and practical implementation purposes to develop and test these models in an efficient manner particularly when computational resources are limited. A method of combining vector quantization (VQ) with single multi-dimensional Gaussians is proposed to rapidly generate a robust model approximation to the Gaussian mixture model. A fast method of testing these systems is also proposed and implemented. Results on the NIST 1996 Speaker Recognition Database suggest comparable and in some cases an improved verification performance to the traditional GMM based analysis scheme. In addition, previous research for the task of speaker identification indicated a similar system perfomance between the VQ Gaussian based technique and GMMs

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents a novel technique for the tracking of moving lips for the purpose of speaker identification. In our system, a model of the lip contour is formed directly from chromatic information in the lip region. Iterative refinement of contour point estimates is not required. Colour features are extracted from the lips via concatenated profiles taken around the lip contour. Reduction of order in lip features is obtained via principal component analysis (PCA) followed by linear discriminant analysis (LDA). Statistical speaker models are built from the lip features based on the Gaussian mixture model (GMM). Identification experiments performed on the M2VTS1 database, show encouraging results

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper investigates the use of lip information, in conjunction with speech information, for robust speaker verification in the presence of background noise. It has been previously shown in our own work, and in the work of others, that features extracted from a speaker's moving lips hold speaker dependencies which are complementary with speech features. We demonstrate that the fusion of lip and speech information allows for a highly robust speaker verification system which outperforms the performance of either sub-system. We present a new technique for determining the weighting to be applied to each modality so as to optimize the performance of the fused system. Given a correct weighting, lip information is shown to be highly effective for reducing the false acceptance and false rejection error rates in the presence of background noise

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Investigates the use of temporal lip information, in conjunction with speech information, for robust, text-dependent speaker identification. We propose that significant speaker-dependent information can be obtained from moving lips, enabling speaker recognition systems to be highly robust in the presence of noise. The fusion structure for the audio and visual information is based around the use of multi-stream hidden Markov models (MSHMM), with audio and visual features forming two independent data streams. Recent work with multi-modal MSHMMs has been performed successfully for the task of speech recognition. The use of temporal lip information for speaker identification has been performed previously (T.J. Wark et al., 1998), however this has been restricted to output fusion via single-stream HMMs. We present an extension to this previous work, and show that a MSHMM is a valid structure for multi-modal speaker identification

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Investigates the use of lip information, in conjunction with speech information, for robust speaker verification in the presence of background noise. We have previously shown (Int. Conf. on Acoustics, Speech and Signal Proc., vol. 6, pp. 3693-3696, May 1998) that features extracted from a speaker's moving lips hold speaker dependencies which are complementary with speech features. We demonstrate that the fusion of lip and speech information allows for a highly robust speaker verification system which outperforms either subsystem individually. We present a new technique for determining the weighting to be applied to each modality so as to optimize the performance of the fused system. Given a correct weighting, lip information is shown to be highly effective for reducing the false acceptance and false rejection error rates in the presence of background noise