997 resultados para speaker identification


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Detect and Avoid (DAA) technology is widely acknowledged as a critical enabler for unsegregated Remote Piloted Aircraft (RPA) operations, particularly Beyond Visual Line of Sight (BVLOS). Image-based DAA, in the visible spectrum, is a promising technological option for addressing the challenges DAA presents. Two impediments to progress for this approach are the scarcity of available video footage to train and test algorithms, in conjunction with testing regimes and specifications which facilitate repeatable, statistically valid, performance assessment. This paper includes three key contributions undertaken to address these impediments. In the first instance, we detail our progress towards the creation of a large hybrid collision and near-collision encounter database. Second, we explore the suitability of techniques employed by the biometric research community (Speaker Verification and Language Identification), for DAA performance optimisation and assessment. These techniques include Detection Error Trade-off (DET) curves, Equal Error Rates (EER), and the Detection Cost Function (DCF). Finally, the hybrid database and the speech-based techniques are combined and employed in the assessment of a contemporary, image based DAA system. This system includes stabilisation, morphological filtering and a Hidden Markov Model (HMM) temporal filter.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We present a new approach to spoken language modeling for language identification (LID) using the Lempel-Ziv-Welch (LZW) algorithm. The LZW technique is applicable to any kind of tokenization of the speech signal. Because of the efficiency of LZW algorithm to obtain variable length symbol strings in the training data, the LZW codebook captures the essentials of a language effectively. We develop two new deterministic measures for LID based on the LZW algorithm namely: (i) Compression ratio score (LZW-CR) and (ii) weighted discriminant score (LZW-WDS). To assess these measures, we consider error-free tokenization of speech as well as artificially induced noise in the tokenization. It is shown that for a 6 language LID task of OGI-TS database with clean tokenization, the new model (LZW-WDS) performs slightly better than the conventional bigram model. For noisy tokenization, which is the more realistic case, LZW-WDS significantly outperforms the bigram technique

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Perceptual effects of room reverberation on a "sir" or "stir" test-word can be observed when the level of reverberation in the word is increased, while the reverberation in a surrounding 'context I utterance remains at a minimal level. The result is that listeners make more "sit" identifications. When the context's reverberation is also increased, to approach the level in the test word, extrinsic perceptual compensation is observed, so that the number of listeners' "sir" identifications reduces to a value similar to that found with minimal reverberation. Thus far, compensation effects have only been observed with speech or speech-like contexts in which the short-term spectrum changes as the speaker's articulators move. The results reported here show that some noise contexts with static short-term spectra can also give rise to compensation. From these experiments it would appear that compensation requires a context with a temporal envelope that fluctuates to some extent, so that parts of it resemble offsets. These findings are consistent with a rather general kind of perceptual compensation mechanism; one that is informed by the 'tails' that reverberation adds at offsets. Other results reported here show that narrow-band contexts do not bring about compensation, even when their temporal-envelopes are the same as those of the more effective wideband contexts. These results suggest that compensation is confined to the frequency range occupied by the context, and that in a wideband sound it might operate in a 'band by band' manner.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Listeners were asked to identify modified recordings of the words "sir" and "stir," which were spoken by an adult male British-English speaker. Steps along a continuum between the words were obtained by a pointwise interpolation of their temporal-envelopes. These test words were embedded in a longer "context" utterance, and played with different amounts of reverberation. Increasing only the test-word's reverberation shifts the listener's category boundary so that more "sir"-identifications are made. This effect reduces when the context's reverberation is also increased, indicating perceptual compensation that is informed by the context. Experiment I finds that compensation is more prominent in rapid speech, that it varies between rooms, that it is more prominent when the test-word's reverberation is high, and that it increases with the context's reverberation. Further experiments show that compensation persists when the room is switched between the context and the test word, when presentation is monaural, and when the context is reversed. However, compensation reduces when the context's reverberation pattern is reversed, as well as when noise-versions of the context are used. "Tails" that reverberation introduces at the ends of sounds and at spectral transitions may inform the compensation mechanism about the amount of reflected sound in the signal. (c) 2005 Acoustical Society of America.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Durante el proceso de producción de voz, los factores anatómicos, fisiológicos o psicosociales del individuo modifican los órganos resonadores, imprimiendo en la voz características particulares. Los sistemas ASR tratan de encontrar los matices característicos de una voz y asociarlos a un individuo o grupo. La edad y sexo de un hablante son factores intrínsecos que están presentes en la voz. Este trabajo intenta diferenciar esas características, aislarlas y usarlas para detectar el género y la edad de un hablante. Para dicho fin, se ha realizado el estudio y análisis de las características basadas en el pulso glótico y el tracto vocal, evitando usar técnicas clásicas (como pitch y sus derivados) debido a las restricciones propias de dichas técnicas. Los resultados finales de nuestro estudio alcanzan casi un 100% en reconocimiento de género mientras en la tarea de reconocimiento de edad el reconocimiento se encuentra alrededor del 80%. Parece ser que la voz queda afectada por el género del hablante y las hormonas, aunque no se aprecie en la audición. ABSTRACT Particular elements of the voice are printed during the speech production process and are related to anatomical and physiological factors of the phonatory system or psychosocial factors acquired by the speaker. ASR systems attempt to find those peculiar nuances of a voice and associate them to an individual or a group. Age and gender are inherent factors to the speaker which may be represented in voice. This work attempts to differentiate those characteristics, isolate them and use them to detect speaker’s gender and age. Features based on glottal pulse and vocal tract are studied and analyzed in order to achieve good results in both tasks. Classical methodologies (such as pitch and derivates) are avoided since the requirements of those techniques may be too restrictive. The final scores achieve almost 100% in gender recognition whereas in age recognition those scores are around 80%. Factors related to the gender and hormones seem to affect the voice although they are not audible.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

One of the biggest challenges in speech synthesis is the production of contextually-appropriate naturally sounding synthetic voices. This means that a Text-To-Speech system must be able to analyze a text beyond the sentence limits in order to select, or even modulate, the speaking style according to a broader context. Our current architecture is based on a two-step approach: text genre identification and speaking style synthesis according to the detected discourse genre. For the final implementation, a set of four genres and their corresponding speaking styles were considered: broadcast news, live sport commentaries, interviews and political speeches. In the final TTS evaluation, the four speaking styles were transplanted to the neutral voices of other speakers not included in the training database. When the transplanted styles were compared to the neutral voices, transplantation was significantly preferred and the similarity to the target speaker was as high as 78%.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The problem of determining the script and language of a document image has a number of important applications in the field of document analysis, such as indexing and sorting of large collections of such images, or as a precursor to optical character recognition (OCR). In this paper, we investigate the use of texture as a tool for determining the script of a document image, based on the observation that text has a distinct visual texture. An experimental evaluation of a number of commonly used texture features is conducted on a newly created script database, providing a qualitative measure of which features are most appropriate for this task. Strategies for improving classification results in situations with limited training data and multiple font types are also proposed.