939 resultados para cross-language speaker recognition
Resumo:
This paper describes a novel approach to phonotactic LID, where instead of using soft-counts based on phoneme lattices, we use posteriogram to obtain n-gram counts. The high-dimensional vectors of counts are reduced to low-dimensional units for which we adapted the commonly used term i-vectors. The reduction is based on multinomial subspace modeling and is designed to work in the total-variability space. The proposed technique was tested on the NIST 2009 LRE set with better results to a system based on using soft-counts (Cavg on 30s: 3.15% vs 3.43%), and with very good results when fused with an acoustic i-vector LID system (Cavg on 30s acoustic 2.4% vs 1.25%). The proposed technique is also compared with another low dimensional projection system based on PCA. In comparison with the original soft-counts, the proposed technique provides better results, reduces the problems due to sparse counts, and avoids the process of using pruning techniques when creating the lattices.
Resumo:
We present an approach to adapt dynamically the language models (LMs) used by a speech recognizer that is part of a spoken dialogue system. We have developed a grammar generation strategy that automatically adapts the LMs using the semantic information that the user provides (represented as dialogue concepts), together with the information regarding the intentions of the speaker (inferred by the dialogue manager, and represented as dialogue goals). We carry out the adaptation as a linear interpolation between a background LM, and one or more of the LMs associated to the dialogue elements (concepts or goals) addressed by the user. The interpolation weights between those models are automatically estimated on each dialogue turn, using measures such as the posterior probabilities of concepts and goals, estimated as part of the inference procedure to determine the actions to be carried out. We propose two approaches to handle the LMs related to concepts and goals. Whereas in the first one we estimate a LM for each one of them, in the second one we apply several clustering strategies to group together those elements that share some common properties, and estimate a LM for each cluster. Our evaluation shows how the system can estimate a dynamic model adapted to each dialogue turn, which helps to improve the performance of the speech recognition (up to a 14.82% of relative improvement), which leads to an improvement in both the language understanding and the dialogue management tasks.
Resumo:
This paper presents a description of our system for the Albayzin 2012 LRE competition. One of the main characteristics of this evaluation was the reduced number of available files for training the system, especially for the empty condition where no training data set was provided but only a development set. In addition, the whole database was created from online videos and around one third of the training data was labeled as noisy files. Our primary system was the fusion of three different i-vector based systems: one acoustic system based on MFCCs, a phonotactic system using trigrams of phone-posteriorgram counts, and another acoustic system based on RPLPs that improved robustness against noise. A contrastive system that included new features based on the glottal source was also presented. Official and postevaluation results for all the conditions using the proposed metrics for the evaluation and the Cavg metric are presented in the paper.
Resumo:
This paper presents new techniques with relevant improvements added to the primary system presented by our group to the Albayzin 2012 LRE competition, where the use of any additional corpora for training or optimizing the models was forbidden. In this work, we present the incorporation of an additional phonotactic subsystem based on the use of phone log-likelihood ratio features (PLLR) extracted from different phonotactic recognizers that contributes to improve the accuracy of the system in a 21.4% in terms of Cavg (we also present results for the official metric during the evaluation, Fact). We will present how using these features at the phone state level provides significant improvements, when used together with dimensionality reduction techniques, especially PCA. We have also experimented with applying alternative SDC-like configurations on these PLLR features with additional improvements. Also, we will describe some modifications to the MFCC-based acoustic i-vector system which have also contributed to additional improvements. The final fused system outperformed the baseline in 27.4% in Cavg.
Resumo:
A new language recognition technique based on the application of the philosophy of the Shifted Delta Coefficients (SDC) to phone log-likelihood ratio features (PLLR) is described. The new methodology allows the incorporation of long-span phonetic information at a frame-by-frame level while dealing with the temporal length of each phone unit. The proposed features are used to train an i-vector based system and tested on the Albayzin LRE 2012 dataset. The results show a relative improvement of 33.3% in Cavg in comparison with different state-of-the-art acoustic i-vector based systems. On the other hand, the integration of parallel phone ASR systems where each one is used to generate multiple PLLR coefficients which are stacked together and then projected into a reduced dimension are also presented. Finally, the paper shows how the incorporation of state information from the phone ASR contributes to provide additional improvements and how the fusion with the other acoustic and phonotactic systems provides an important improvement of 25.8% over the system presented during the competition.
Resumo:
Speech recognition involves three processes: extraction of acoustic indices from the speech signal, estimation of the probability that the observed index string was caused by a hypothesized utterance segment, and determination of the recognized utterance via a search among hypothesized alternatives. This paper is not concerned with the first process. Estimation of the probability of an index string involves a model of index production by any given utterance segment (e.g., a word). Hidden Markov models (HMMs) are used for this purpose [Makhoul, J. & Schwartz, R. (1995) Proc. Natl. Acad. Sci. USA 92, 9956-9963]. Their parameters are state transition probabilities and output probability distributions associated with the transitions. The Baum algorithm that obtains the values of these parameters from speech data via their successive reestimation will be described in this paper. The recognizer wishes to find the most probable utterance that could have caused the observed acoustic index string. That probability is the product of two factors: the probability that the utterance will produce the string and the probability that the speaker will wish to produce the utterance (the language model probability). Even if the vocabulary size is moderate, it is impossible to search for the utterance exhaustively. One practical algorithm is described [Viterbi, A. J. (1967) IEEE Trans. Inf. Theory IT-13, 260-267] that, given the index string, has a high likelihood of finding the most probable utterance.
Resumo:
Although the importance of CD4(+) T cell responses to human cytonnegalovirus (HCMV) has recently been recognized in transplant and immunosuppressed patients, the precise specificity and nature of this response has remained largely unresolved. In the present study we have isolated CD4(+) CTL which recognize epitopes from HCMV glycoproteins gB and gH in association with two different HLA-DR antigens, DRA1*0101/DRB1*0701 (DR7) and DRA1*0101/DRB1*1101 (DR11). Comparison of amino acid sequences of HICMV isolates revealed that the gB and gH epitope sequences recognized by human CD4(+) T cells were not only conserved in clinical isolates from HCMV but also in CMV isolates from higher primates (chimpanzee, rhesus and baboon). Interestingly, these epitope sequences from chimpanzee, rhesus and baboon CMV are efficiently recognized by human CD4(+) CTL. More importantly, we show that gB-specific T cells from humans can also efficiently lyse pepticle-sensitized Patr-DR7(+) cells from chimpanzees. These findings suggest that conserved gB and gH epitopes should be considered while designing a prophylactic vaccine against HCMV. In addition, they also provide a functional basis for the conservation of MHC class 11 lineages between humans and Old World primates and open the possibility for the use of such primate models in vaccine development against HCMV.
Resumo:
The study examines the concept of cultural determinism in relation to the business interview, analysing differences in language use between English, French and West German native speakers. The approach is multi- and inter-disciplinary combining linguistic and business research methodologies. An analytical model based on pragmatic and speech act theory is developed to analyse language use in telephone market research interviews. The model aims to evaluate behavioural differences between English, French and West German respondents in the interview situation. The empirical research is based on a telephone survey of industrial managers, conducted in the three countries in the national language of each country. The telephone interviews are transcribed and compared across languages to discover how managers from each country use different language functions to reply to questions and requests. These differences are assessed in terms of specific cultural parameters: politeness, self-assuredness and fullness of response. Empirical and descriptive studies of national character are compared with the survey results, providing the basis for an evaluation of the relationship between management culture and national culture on a contrastive and comparative cross-cultural basis. The project conclusions focus on the implications of the findings both for business interviewing and for language teaching.
Resumo:
350 p.
Resumo:
Phonation distortion leaves relevant marks in a speaker's biometric profile. Dysphonic voice production may be used for biometrical speaker characterization. In the present paper phonation features derived from the glottal source (GS) parameterization, after vocal tract inversion, is proposed for dysphonic voice characterization in Speaker Verification tasks. The glottal source derived parameters are matched in a forensic evaluation framework defining a distance-based metric specification. The phonation segments used in the study are derived from fillers, long vowels, and other phonation segments produced in spontaneous telephone conversations. Phonated segments from a telephonic database of 100 male Spanish native speakers are combined in a 10-fold cross-validation task to produce the set of quality measurements outlined in the paper. Shimmer, mucosal wave correlate, vocal fold cover biomechanical parameter unbalance and a subset of the GS cepstral profile produce accuracy rates as high as 99.57 for a wide threshold interval (62.08-75.04%). An Equal Error Rate of 0.64 % can be granted. The proposed metric framework is shown to behave more fairly than classical likelihood ratios in supporting the hypothesis of the defense vs that of the prosecution, thus ofering a more reliable evaluation scoring. Possible applications are Speaker Verification and Dysphonic Voice Grading.
Resumo:
10 lectal variables were examined with respect to Norwegian speakers' acceptance of long-distance reflexives (LDR), using a questionnaire to elicit grammaticality judgements on 50 potential LDR sentences. A sample of 180 speakers completed the questionnaire. The data was analysed using a general linear model univariate model, and Spearman's correlation. In this sample the results showed that dialect and level of education had significant effects on speakers' acceptance of long-distance reflexives, while sex, age, being a native speaker, having both native-speaker parents, living in the city or the country, and the speaker's attitudes to the two Norwegian writing languages had no influence on speakers' acceptance of long-distance reflexives. It is suggested that the influence of Danish on Norwegian writing and on the southern dialects may be the cause of the observed variation with respect to LDR in Norwegian.
Resumo:
As part of a major ongoing project, we consider and compare contemporary patterns of address pronoun use in four major European languages- French, German, Italian and Swedish. We are specifically interested in two major aspects: intralingual behaviour, that is, within the same language community, and interlingual dimensions of address pronoun use. With respect to the former, we summarize our key findings to date. We then give consideration in a more preliminary fashion to issues and evidence relevant to the latter.