1 resultado para Visual Speaker Recognition, Visual Speech Recognition, Cascading Appearance-Based Features
em Digital Peer Publishing
Filtro por publicador
- Acceda, el repositorio institucional de la Universidad de Las Palmas de Gran Canaria. España (5)
- AMS Tesi di Dottorato - Alm@DL - Università di Bologna (6)
- AMS Tesi di Laurea - Alm@DL - Università di Bologna (4)
- ArchiMeD - Elektronische Publikationen der Universität Mainz - Alemanha (1)
- Archive of European Integration (1)
- Aston University Research Archive (42)
- B-Digital - Universidade Fernando Pessoa - Portugal (1)
- Biblioteca de Teses e Dissertações da USP (1)
- Biblioteca Digital da Produção Intelectual da Universidade de São Paulo (5)
- Biblioteca Digital da Produção Intelectual da Universidade de São Paulo (BDPI/USP) (61)
- Biblioteca Virtual del Sistema Sanitario Público de Andalucía (BV-SSPA), Junta de Andalucía. Consejería de Salud y Bienestar Social, Spain (1)
- BORIS: Bern Open Repository and Information System - Berna - Suiça (17)
- Brock University, Canada (7)
- Bucknell University Digital Commons - Pensilvania - USA (3)
- Bulgarian Digital Mathematics Library at IMI-BAS (5)
- CentAUR: Central Archive University of Reading - UK (18)
- CiencIPCA - Instituto Politécnico do Cávado e do Ave, Portugal (4)
- Cochin University of Science & Technology (CUSAT), India (11)
- Coffee Science - Universidade Federal de Lavras (1)
- Consorci de Serveis Universitaris de Catalunya (CSUC), Spain (34)
- Cor-Ciencia - Acuerdo de Bibliotecas Universitarias de Córdoba (ABUC), Argentina (2)
- Dalarna University College Electronic Archive (5)
- Deposito de Dissertacoes e Teses Digitais - Portugal (2)
- Digital Commons - Michigan Tech (2)
- Digital Commons @ DU | University of Denver Research (1)
- Digital Commons at Florida International University (6)
- Digital Peer Publishing (1)
- DigitalCommons@The Texas Medical Center (4)
- DigitalCommons@University of Nebraska - Lincoln (1)
- Doria (National Library of Finland DSpace Services) - National Library of Finland, Finland (14)
- DRUM (Digital Repository at the University of Maryland) (2)
- Duke University (2)
- Galway Mayo Institute of Technology, Ireland (2)
- Georgian Library Association, Georgia (1)
- Glasgow Theses Service (3)
- Illinois Digital Environment for Access to Learning and Scholarship Repository (1)
- Institute of Public Health in Ireland, Ireland (1)
- Instituto Politécnico de Leiria (1)
- Instituto Politécnico do Porto, Portugal (43)
- Instituto Superior de Psicologia Aplicada - Lisboa (1)
- Martin Luther Universitat Halle Wittenberg, Germany (21)
- Massachusetts Institute of Technology (21)
- Memorial University Research Repository (1)
- Ministerio de Cultura, Spain (3)
- National Center for Biotechnology Information - NCBI (16)
- Portal de Revistas Científicas Complutenses - Espanha (2)
- QSpace: Queen's University - Canada (1)
- QUB Research Portal - Research Directory and Institutional Repository for Queen's University Belfast (11)
- ReCiL - Repositório Científico Lusófona - Grupo Lusófona, Portugal (3)
- Repositório Científico da Universidade de Évora - Portugal (2)
- Repositório Científico do Instituto Politécnico de Lisboa - Portugal (54)
- Repositório da Escola Nacional de Administração Pública (ENAP) (2)
- Repositório da Produção Científica e Intelectual da Unicamp (28)
- Repositório da Universidade Federal do Espírito Santo (UFES), Brazil (3)
- Repositório digital da Fundação Getúlio Vargas - FGV (1)
- Repositório do Centro Hospitalar de Lisboa Central, EPE - Centro Hospitalar de Lisboa Central, EPE, Portugal (2)
- Repositório Institucional da Universidade de Aveiro - Portugal (1)
- Repositório Institucional da Universidade Estadual de São Paulo - UNESP (1)
- Repositorio Institucional de la Universidad de Málaga (1)
- Repositório Institucional UNESP - Universidade Estadual Paulista "Julio de Mesquita Filho" (33)
- Repositorio Institucional Universidad EAFIT - Medelin - Colombia (1)
- RUN (Repositório da Universidade Nova de Lisboa) - FCT (Faculdade de Cienecias e Technologia), Universidade Nova de Lisboa (UNL), Portugal (39)
- SAPIENTIA - Universidade do Algarve - Portugal (1)
- School of Medicine, Washington University, United States (15)
- Scielo Saúde Pública - SP (39)
- Universidad de Alicante (8)
- Universidad del Rosario, Colombia (1)
- Universidad Politécnica de Madrid (31)
- Universidade do Minho (25)
- Universidade Estadual Paulista "Júlio de Mesquita Filho" (UNESP) (1)
- Universidade Federal do Pará (3)
- Universidade Federal do Rio Grande do Norte (UFRN) (4)
- Universitat de Girona, Spain (11)
- Universitätsbibliothek Kassel, Universität Kassel, Germany (2)
- Université de Lausanne, Switzerland (68)
- Université de Montréal (1)
- Université de Montréal, Canada (4)
- University of Canberra Research Repository - Australia (1)
- University of Michigan (7)
- University of Queensland eSpace - Australia (144)
- University of Washington (2)
- WestminsterResearch - UK (3)
Resumo:
Audio-visual documents obtained from German TV news are classified according to the IPTC topic categorization scheme. To this end usual text classification techniques are adapted to speech, video, and non-speech audio. For each of the three modalities word analogues are generated: sequences of syllables for speech, “video words” based on low level color features (color moments, color correlogram and color wavelet), and “audio words” based on low-level spectral features (spectral envelope and spectral flatness) for non-speech audio. Such audio and video words provide a means to represent the different modalities in a uniform way. The frequencies of the word analogues represent audio-visual documents: the standard bag-of-words approach. Support vector machines are used for supervised classification in a 1 vs. n setting. Classification based on speech outperforms all other single modalities. Combining speech with non-speech audio improves classification. Classification is further improved by supplementing speech and non-speech audio with video words. Optimal F-scores range between 62% and 94% corresponding to 50% - 84% above chance. The optimal combination of modalities depends on the category to be recognized. The construction of audio and video words from low-level features provide a good basis for the integration of speech, non-speech audio and video.