Biblioteca Digital

967 resultados para Speaker Recognition, Text-constrained, Multilingual, Speaker Verification, HMMs

The GIAPSI NIST 2012 speaker recognition evaluation system

Relevância:

60.00% 60.00%

Publicador:

Veja mais

Classical vs. biometric features in the 2013 speaker recognition evaluation in mobile environments

Relevância:

60.00% 60.00%

Publicador:

Resumo:

MFCC coefficients extracted from the power spectral density of speech as a whole, seems to have become the de facto standard in the area of speaker recognition, as demonstrated by its use in almost all systems submitted to the 2013 Speaker Recognition Evaluation (SRE) in Mobile Environment [1], thus relegating to background this component of the recognition systems. However, in this article we will show that selecting the adequate speaker characterization system is as important as the selection of the classifier. To accomplish this we will compare the recognition rates achieved by different recognition systems that relies on the same classifier (GMM-UBM) but connected with different feature extraction systems (based on both classical and biometric parameters). As a result we will show that a gender dependent biometric parameterization with a simple recognition system based on GMM- UBM paradigm provides very competitive or even better recognition rates when compared to more complex classification systems based on classical features

Veja mais

Toward the ultimate synthesis/recognition system.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper predicts speech synthesis, speech recognition, and speaker recognition technology for the year 2001, and it describes the most important research problems to be solved in order to arrive at these ultimate synthesis and recognition systems. The problems for speech synthesis include natural and intelligible voice production, prosody control based on meaning, capability of controlling synthesized voice quality and choosing individual speaking style, multilingual and multidialectal synthesis, choice of application-oriented speaking styles, capability of adding emotion, and synthesis from concepts. The problems for speech recognition include robust recognition against speech variations, adaptation/normalization to variations due to environmental conditions and speakers, automatic knowledge acquisition for acoustic and linguistic modeling, spontaneous speech recognition, naturalness and ease of human-machine interaction, and recognition of emotion. The problems for speaker recognition are similar to those for speech recognition. The research topics related to all these techniques include the use of articulatory and perceptual constraints and evaluation methods for measuring the quality of technology and systems.

Veja mais

Speaker diarization: Segmentation and clustering of speeches

Relevância:

50.00% 50.00%

Publicador:

Resumo:

Speaker diarization is the process of sorting speeches according to the speaker. Diarization helps to search and retrieve what a certain speaker uttered in a meeting. Applications of diarization systemsextend to other domains than meetings, for example, lectures, telephone, television, and radio. Besides, diarization enhances the performance of several speech technologies such as speaker recognition, automatic transcription, and speaker tracking. Methodologies previously used in developing diarization systems are discussed. Prior results and techniques are studied and compared. Methods such as Hidden Markov Models and Gaussian Mixture Models that are used in speaker recognition and other speech technologies are also used in speaker diarization. The objective of this thesis is to develop a speaker diarization system in meeting domain. Experimental part of this work indicates that zero-crossing rate can be used effectively in breaking down the audio stream into segments, and adaptive Gaussian Models fit adequately short audio segments. Results show that 35 Gaussian Models and one second as average length of each segment are optimum values to build a diarization system for the tested data. Uniting the segments which are uttered by same speaker is done in a bottom-up clustering by a newapproach of categorizing the mixture weights.

Veja mais

Is professional recognition in plastic surgery related to activity in research

Relevância:

50.00% 50.00%

Publicador:

Resumo:

OBJECTIVE: To evaluate the relation of medical research, with the participation of prominent plastic surgeon in Congress.METHODS: We reviewed the scientific programs of the last 3 Brazilian Congress of Surgery, were selected 21 Brazilian plástic surgeons invited to serve as panelists or speakers in roundtable sessions in the last 3 congresses (Group 1). We randomly selected and paired by other members (associates) of the Brazilian Society of Plastic Surgery, with no participation in congress as speaker (Group 2). We conducted a search for articles published in journals indexed in Medline, Lilacs and SciELO for all doctors selected during the entire academic career and the last 5 years from March 2007 until March 2012. We assessed the research activity through the simple counting of the number of publications in indexed journals for each professional. The number of publications groups was compared.RESULTS: articles produced throughout career: Group 1- 639 articles (average of 30.42 items each). Group 2- 79 articles (mean 3.95 articles each). Difference between medias: p <0.001.CONCLUSION: The results demonstrate that the Brazilian Society of Plastic Surgery seeking professionals with a greater number of publications and journals of higher impact. This approach encourages new members to pursue a higher qualification, and give security to congressmen, they can rely on the existence of a technical criterion in the choice of speakers.

Veja mais

Speaker identification using models for phonemes

Relevância:

50.00% 50.00%

Publicador:

Resumo:

Motivation for Speaker recognition work is presented in the first part of the thesis. An exhaustive survey of past work in this field is also presented. A low cost system not including complex computation has been chosen for implementation. Towards achieving this a PC based system is designed and developed. A front end analog to digital convertor (12 bit) is built and interfaced to a PC. Software to control the ADC and to perform various analytical functions including feature vector evaluation is developed. It is shown that a fixed set of phrases incorporating evenly balanced phonemes is aptly suited for the speaker recognition work at hand. A set of phrases are chosen for recognition. Two new methods are adopted for the feature evaluation. Some new measurements involving a symmetry check method for pitch period detection and ACE‘ are used as featured. Arguments are provided to show the need for a new model for speech production. Starting from heuristic, a knowledge based (KB) speech production model is presented. In this model, a KB provides impulses to a voice producing mechanism and constant correction is applied via a feedback path. It is this correction that differs from speaker to speaker. Methods of defining measurable parameters for use as features are described. Algorithms for speaker recognition are developed and implemented. Two methods are presented. The first is based on the model postulated. Here the entropy on the utterance of a phoneme is evaluated. The transitions of voiced regions are used as speaker dependent features. The second method presented uses features found in other works, but evaluated differently. A knock—out scheme is used to provide the weightage values for the selection of features. Results of implementation are presented which show on an average of 80% recognition. It is also shown that if there are long gaps between sessions, the performance deteriorates and is speaker dependent. Cross recognition percentages are also presented and this in the worst case rises to 30% while the best case is 0%. Suggestions for further work are given in the concluding chapter.

Veja mais

Eventive and stative passives and copula selection in Canadian and American heritage speaker Spanish

Relevância:

50.00% 50.00%

Publicador:

Resumo:

Spanish captures the difference between eventive and stative passives via an obligatory choice between two copula; verbal passives take the copula ser and adjectival passives take the copula estar. In this study, we compare and contrast US and Canadian heritage speakers of Spanish on their knowledge of this difference in relation to copula choice in Spanish. The backgrounds of the target groups differ significantly from each other in that only one of them, the Canadian group, has grown up in a societal multilingual environment. We discuss the results as being supportive of two non-mutually exclusive explanation factors: (a) French facilitates (bootstraps) the acquisition of eventive and stative passives and/or (b) the US/Canadian HS differences (e.g. status of bilingualism and the languages at stake) is a reflection of the uniqueness of the language contact situations and the effects this has on the input HSS receive.

Veja mais

Combining 3D and 2D for less constrained periocular recognition

Relevância:

50.00% 50.00%

Publicador:

Resumo:

Periocular recognition has recently become an active topic in biometrics. Typically it uses 2D image data of the periocular region. This paper is the first description of combining 3D shape structure with 2D texture. A simple and effective technique using iterative closest point (ICP) was applied for 3D periocular region matching. It proved its strength for relatively unconstrained eye region capture, and does not require any training. Local binary patterns (LBP) were applied for 2D image based periocular matching. The two modalities were combined at the score-level. This approach was evaluated using the Bosphorus 3D face database, which contains large variations in facial expressions, head poses and occlusions. The rank-1 accuracy achieved from the 3D data (80%) was better than that for 2D (58%), and the best accuracy (83%) was achieved by fusing the two types of data. This suggests that significant improvements to periocular recognition systems could be achieved using the 3D structure information that is now available from small and inexpensive sensors.

Veja mais

Racial Conflict in the United States of America : A Deconstructive Perspective on Native Speaker by Changrae Lee

Relevância:

50.00% 50.00%

Publicador:

Resumo:

Written about the time of the Golden Venture incident, Chang-rae Lee’s Native Speaker makes a particular reference to that incident, whereby implying that particular immigrants, on the grounds of their racial identities, are mistreated and considered as aliens by some Americas. While some whites discriminate against immigrants, there is widespread ethnic tension between Korean Americans and African Americans. Significantly, racial conflict between Koreans and blacks and the racist attitude of some whites toward immigrants are mirrored in the relationship between the Korean-American protagonist Henry and his American wife Lelia. That is, due to their different racial identities they do not understand each other and they always argue. However, toward the end of the novel, Henry and Lelia come to understand each other. While ethnic conflict between Koreans and blacks and certain whites’ discriminatory attitudes toward immigrants is serious one, the novel suggests the unimportance of racial identity. In other words, the novel concludes that there is no discriminatory treatment of immigrants and, in fact, every one is a native Speaker in America. In the novel there is no message of how racial conflict could be resolved. However, this essay suggests that by investigating how the tension between Henry and Lelia is resolved, one could suggest a solution for the ethnicity problem in America and in real life.

Veja mais

Guest Speaker

Relevância:

50.00% 50.00%

Publicador:

Resumo:

David Salmela is the special guest speaker for the opening reception.

Veja mais

Implicit memory for the content but not the speaker of sleep-played messages

Relevância:

50.00% 50.00%

Publicador:

Resumo:

We presented 28 sentences uttered by 28 unfamiliar speakers to sleeping participants to investigate whether humans can encode new verbal messages, learn voices of unfamiliar speakers, and form associations between speakers and messages during EEG-defined deep sleep. After waking, participants performed three tests which assessed the unconscious recognition of sleep-played speakers, messages, and speaker-message associations. Recognition performance in all tests was at chance level. However, response latencies revealed implicit memory for sleep-played messages but neither for speakers nor for speaker-message combinations. Only participants with excellent implicit memory for sleep-played messages also displayed implicit memory for speakers but not speaker-message associations. Hence, deep sleep allows for the semantic encoding of novel verbal messages.

Veja mais

Towards an unsupervised speaking style voice building framework: multi-style speaker diarization

Relevância:

50.00% 50.00%

Publicador:

Resumo:

Current text-to-speech systems are developed using studio-recorded speech in a neutral style or based on acted emotions. However, the proliferation of media sharing sites would allow developing a new generation of speech-based systems which could cope with spontaneous and styled speech. This paper proposes an architecture to deal with realistic recordings and carries out some experiments on unsupervised speaker diarization. In order to maximize the speaker purity of the clusters while keeping a high speaker coverage, the paper evaluates the F-measure of a diarization module, achieving high scores (>85%) especially when the clusters are longer than 30 seconds, even for the more spontaneous and expressive styles (such as talk shows or sports).

Veja mais

Development of a genre-dependent TTS system with cross-speaker speaking-style transplantation

Relevância:

50.00% 50.00%

Publicador:

Resumo:

One of the biggest challenges in speech synthesis is the production of contextually-appropriate naturally sounding synthetic voices. This means that a Text-To-Speech system must be able to analyze a text beyond the sentence limits in order to select, or even modulate, the speaking style according to a broader context. Our current architecture is based on a two-step approach: text genre identification and speaking style synthesis according to the detected discourse genre. For the final implementation, a set of four genres and their corresponding speaking styles were considered: broadcast news, live sport commentaries, interviews and political speeches. In the final TTS evaluation, the four speaking styles were transplanted to the neutral voices of other speakers not included in the training database. When the transplanted styles were compared to the neutral voices, transplantation was significantly preferred and the similarity to the target speaker was as high as 78%.

Veja mais

An Evening with Dr. Venus Opal Reese, Motivational Speaker, Business Woman, and Author

Relevância:

50.00% 50.00%

Publicador:

Resumo:

This Droppin' Knowledge Lecture Series will challenge the audience to "defy the impossible". Dr. Venus Opal Reese will offer insight on how to achieve success on Thursday, September 17, at 6:30pm in Martin Luther King Hall-Thomas Pawley Theature, 812 E. Dunklin Street. Reese, an inspirational speaker, business mentor and marketing strategist, offers training to professionals, particularly entrepreneurs and executives, on how to "defy their impossible" to reach million-dollar success. Reese has consulted for O Magazines, and appeared on ABC and CBS News. For more information on Dr. Venus Opal Reese, please visit http://defyimpossible.com.

Veja mais

Floyd Starr and Bud Guest, speaker at dedication of Starr AV Room, 17 September 1977 [negative 5139]

Relevância:

50.00% 50.00%

Publicador:

Veja mais

967 resultados para Speaker Recognition, Text-constrained, Multilingual, Speaker Verification, HMMs

Filtro por publicador