Content Classification of Multimedia Documents using Partitions of Low-Level Features
Data(s) |
22/05/2006
02/01/2007
|
---|---|
Resumo |
Audio-visual documents obtained from German TV news are classified according to the IPTC topic categorization scheme. To this end usual text classification techniques are adapted to speech, video, and non-speech audio. For each of the three modalities word analogues are generated: sequences of syllables for speech, “video words” based on low level color features (color moments, color correlogram and color wavelet), and “audio words” based on low-level spectral features (spectral envelope and spectral flatness) for non-speech audio. Such audio and video words provide a means to represent the different modalities in a uniform way. The frequencies of the word analogues represent audio-visual documents: the standard bag-of-words approach. Support vector machines are used for supervised classification in a 1 vs. n setting. Classification based on speech outperforms all other single modalities. Combining speech with non-speech audio improves classification. Classification is further improved by supplementing speech and non-speech audio with video words. Optimal F-scores range between 62% and 94% corresponding to 50% - 84% above chance. The optimal combination of modalities depends on the category to be recognized. The construction of audio and video words from low-level features provide a good basis for the integration of speech, non-speech audio and video. |
Identificador |
urn:nbn:de:0009-6-7607 |
Idioma(s) |
eng |
Direitos |
DPPL |
Fonte |
JVRB - Journal of Virtual Reality and Broadcasting ; 3(2006) , 6 |
Palavras-Chave | #004 #http://dewey.info/class/004/ #Audio-visual content classification #integration of modalities #speech recognition #support vector machines #swd: Inhaltsanalyse #swd: Automatische Spracherkennung #swd: Support-Vektor-Maschine |