Biblioteca Digital

Using Kullback-Leibler Distance for Text Categorization

**Autoria(s):** Bigi, Brigitte
Contribuinte(s)	ADELE (LIG Laboratoire d'Informatique de Grenoble) ; Université Pierre Mendès France - Grenoble 2 (UPMF) - Université Joseph Fourier - Grenoble 1 (UJF) - Institut National Polytechnique de Grenoble (INPG) - Centre National de la Recherche Scientifique (CNRS)
Data(s)	2003
Resumo	International audience A system that performs text categorization aims to assign appropriate categories from a predefined classification scheme to incoming documents. These assignments might be used for varied purposes such as filtering, or retrieval. This paper introduces a new effective model for text categorization with great corpus (more or less 1 million documents). Text categorization is performed using the Kullback-Leibler distance between the probability distribution of the document to classify and the probability distribution of each category. Using the same representation of categories, experiments show a significant improvement when the above mentioned method is used. KLD method achieve substantial improvements over the tfidf performing method.
Identificador	hal-01392500 https://hal.archives-ouvertes.fr/hal-01392500 DOI : 10.1007/3-540-36618-0_22
Idioma(s)	en
Publicador	HAL CCSD Springer Berlin Heidelberg
Relação	info:eu-repo/semantics/altIdentifier/doi/10.1007/3-540-36618-0_22
Fonte	Advances in Information Retrieval https://hal.archives-ouvertes.fr/hal-01392500 Advances in Information Retrieval, 2633, Springer Berlin Heidelberg, pp.305-319, 2003, <10.1007/3-540-36618-0_22>
Palavras-Chave	#Text categorization #Kullback-Leibler Divergence #[INFO.INFO-CL] Computer Science [cs]/Computation and Language [cs.CL] #[SHS.INFO] Humanities and Social Sciences/Library and information sciences
Tipo	info:eu-repo/semantics/bookPart Book section

Acesso ao item digital