Improved voice activity detection algorithm using wavelet and support vector machine


Autoria(s): CHEN, Shi-Huang; GUIDO, Rodrigo Capobianco; TRUONG, Trieu-Kien; CHANG, Yaotsu
Contribuinte(s)

UNIVERSIDADE DE SÃO PAULO

Data(s)

20/10/2012

20/10/2012

2010

Resumo

This paper proposes an improved voice activity detection (VAD) algorithm using wavelet and support vector machine (SVM) for European Telecommunication Standards Institution (ETS1) adaptive multi-rate (AMR) narrow-band (NB) and wide-band (WB) speech codecs. First, based on the wavelet transform, the original IIR filter bank and pitch/tone detector are implemented, respectively, via the wavelet filter bank and the wavelet-based pitch/tone detection algorithm. The wavelet filter bank can divide input speech signal into several frequency bands so that the signal power level at each sub-band can be calculated. In addition, the background noise level can be estimated in each sub-band by using the wavelet de-noising method. The wavelet filter bank is also derived to detect correlated complex signals like music. Then the proposed algorithm can apply SVM to train an optimized non-linear VAD decision rule involving the sub-band power, noise level, pitch period, tone flag, and complex signals warning flag of input speech signals. By the use of the trained SVM, the proposed VAD algorithm can produce more accurate detection results. Various experimental results carried out from the Aurora speech database with different noise conditions show that the proposed algorithm gives considerable VAD performances superior to the AMR-NB VAD Options 1 and 2, and AMR-WB VAD. (C) 2009 Elsevier Ltd. All rights reserved.

National Science Council, Taiwan, ROC[NSC 97-2221-E-366010-MY3]

National Science Council, Taiwan, ROC

Identificador

COMPUTER SPEECH AND LANGUAGE, v.24, n.3, p.531-543, 2010

0885-2308

http://producao.usp.br/handle/BDPI/30162

10.1016/j.csl.2009.06.002

http://dx.doi.org/10.1016/j.csl.2009.06.002

Idioma(s)

eng

Publicador

ACADEMIC PRESS LTD- ELSEVIER SCIENCE LTD

Relação

Computer Speech and Language

Direitos

restrictedAccess

Copyright ACADEMIC PRESS LTD- ELSEVIER SCIENCE LTD

Palavras-Chave #Voice activity detection (VAD) #AMR-NB #AMR-WB #Wavelet transform #Support vector machine (SVM) #Computer Science, Artificial Intelligence
Tipo

article

original article

publishedVersion