Evaluation of Speaker Normalization Methods for Vowel Recognition Using Fuzzy ARTMAP and K-NN


Autoria(s): Carpenter, Gail A.; Govindarajan, Krishna
Data(s)

14/11/2011

14/11/2011

01/01/1993

Resumo

A procedure that uses fuzzy ARTMAP and K-Nearest Neighbor (K-NN) categorizers to evaluate intrinsic and extrinsic speaker normalization methods is described. Each classifier is trained on preprocessed, or normalized, vowel tokens from about 30% of the speakers of the Peterson-Barney database, then tested on data from the remaining speakers. Intrinsic normalization methods included one nonscaled, four psychophysical scales (bark, bark with end-correction, mel, ERB), and three log scales, each tested on four different combinations of the fundamental (Fo) and the formants (F1 , F2, F3). For each scale and frequency combination, four extrinsic speaker adaptation schemes were tested: centroid subtraction across all frequencies (CS), centroid subtraction for each frequency (CSi), linear scale (LS), and linear transformation (LT). A total of 32 intrinsic and 128 extrinsic methods were thus compared. Fuzzy ARTMAP and K-NN showed similar trends, with K-NN performing somewhat better and fuzzy ARTMAP requiring about 1/10 as much memory. The optimal intrinsic normalization method was bark scale, or bark with end-correction, using the differences between all frequencies (Diff All). The order of performance for the extrinsic methods was LT, CSi, LS, and CS, with fuzzy AHTMAP performing best using bark scale with Diff All; and K-NN choosing psychophysical measures for all except CSi.

British Petroleum (89-A-1204); Defense Advanced Research Projects Agency (AFOSR-90-0083, ONR-N00014-92-J-4015); National Science Foundation (IRI-90-00530); Office of Naval Research (N00014-91-J-4100); Air Force Office of Scientific Research (F49620-92-J-0225)

Identificador

http://hdl.handle.net/2144/1990

Idioma(s)

en_US

Publicador

Boston University Center for Adaptive Systems and Department of Cognitive and Neural Systems

Relação

BU CAS/CNS Technical Reports;CAS/CNS-TR-1993-013

Direitos

Copyright 1993 Boston University. Permission to copy without fee all or part of this material is granted provided that: 1. The copies are not made or distributed for direct commercial advantage; 2. the report title, author, document number, and release date appear, and notice is given that copying is by permission of BOSTON UNIVERSITY TRUSTEES. To copy otherwise, or to republish, requires a fee and / or special permission.

Boston University Trustees

Tipo

Technical Report