Biblioteca Digital

Content Classification of Multimedia Documents using Partitions of Low-Level Features

**Autoria(s):** Leopold, Edda; Kindermann, Jörg
Data(s)	22/05/2006 02/01/2007
Resumo	Audio-visual documents obtained from German TV news are classified according to the IPTC topic categorization scheme. To this end usual text classification techniques are adapted to speech, video, and non-speech audio. For each of the three modalities word analogues are generated: sequences of syllables for speech, “video words” based on low level color features (color moments, color correlogram and color wavelet), and “audio words” based on low-level spectral features (spectral envelope and spectral flatness) for non-speech audio. Such audio and video words provide a means to represent the different modalities in a uniform way. The frequencies of the word analogues represent audio-visual documents: the standard bag-of-words approach. Support vector machines are used for supervised classification in a 1 vs. n setting. Classification based on speech outperforms all other single modalities. Combining speech with non-speech audio improves classification. Classification is further improved by supplementing speech and non-speech audio with video words. Optimal F-scores range between 62% and 94% corresponding to 50% - 84% above chance. The optimal combination of modalities depends on the category to be recognized. The construction of audio and video words from low-level features provide a good basis for the integration of speech, non-speech audio and video.
Identificador	urn:nbn:de:0009-6-7607 http://www.jvrb.org/past-issues/3.2006/760
Idioma(s)	eng
Direitos	DPPL
Fonte	JVRB - Journal of Virtual Reality and Broadcasting ; 3(2006) , 6
Palavras-Chave	#004 #http://dewey.info/class/004/ #Audio-visual content classification #integration of modalities #speech recognition #support vector machines #swd: Inhaltsanalyse #swd: Automatische Spracherkennung #swd: Support-Vektor-Maschine

Acesso ao item digital