A Fuzzy Decision Strategy for Topic Identification and Dynamic Selection of Language Models


Autoria(s): Bigi, Brigitte; De Mori, Renato; El Bèze, Marc; THIERRY, SPRIET
Contribuinte(s)

Laboratoire Informatique d'Avignon (LIA) ; Centre d'Enseignement et de Recherche en Informatique - CERI - Université d'Avignon et des Pays de Vaucluse (UAPV)

Data(s)

2000

Resumo

International audience

The paper introduces a new effective model for topic recognition. The model follows a multi-expert decision paradigm based on fuzzy relations in which fuzzy variables express degrees of reliability of expert decision. Heterogeneous measures are integrated by the fuzzy relations whose structure and components may evolve in time. Experiments resulted in more than 80% topic classification accuracy on articlesof the French newspaper Le Monde which describe a very large variety of facts with a very large vocabulary (of the order of 500,000 words). Experiments show a significant improvement when the above mentioned integration of multi-expert decision is used. A robust strategy for dynamic Language Model (LM) selection, based on topic recognition and switching between topic models, is proposed. It is effective because it relies on a small set of well trained topic-dependent LMs and on reliable topic recognition. By using perplexity as a performance measure of the LM switching model, a tangible reduction is observed with respect to the use of a single, general, static LM.

Identificador

hal-01392257

https://hal.archives-ouvertes.fr/hal-01392257

Idioma(s)

en

Publicador

HAL CCSD

Elsevier

Fonte

ISSN: 0165-1684

Signal Processing

https://hal.archives-ouvertes.fr/hal-01392257

Signal Processing, Elsevier, 2000, Special Issue on Fuzzy Logic in Signal Processing, 80 (6), pp.1085--1097

Palavras-Chave #Topic Detection #Language modelling #[INFO.INFO-CL] Computer Science [cs]/Computation and Language [cs.CL] #[SHS.INFO] Humanities and Social Sciences/Library and information sciences
Tipo

info:eu-repo/semantics/article

Journal articles