Biblioteca Digital

Noise-tolerance feasibility for restricted-domain Information Retrieval systems

**Autoria(s):** Vila Rodríguez, Katia; Fernández Orquín, Antonio; Gómez, José M.; Ferrández, Antonio; Díaz, Josval
Contribuinte(s)	Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos Procesamiento del Lenguaje y Sistemas de Información (GPLSI)
Data(s)	08/09/2014 08/09/2014 01/07/2013
Resumo	Information Retrieval systems normally have to work with rather heterogeneous sources, such as Web sites or documents from Optical Character Recognition tools. The correct conversion of these sources into flat text files is not a trivial task since noise may easily be introduced as a result of spelling or typeset errors. Interestingly, this is not a great drawback when the size of the corpus is sufficiently large, since redundancy helps to overcome noise problems. However, noise becomes a serious problem in restricted-domain Information Retrieval specially when the corpus is small and has little or no redundancy. This paper devises an approach which adds noise-tolerance to Information Retrieval systems. A set of experiments carried out in the agricultural domain proves the effectiveness of the approach presented.
Identificador	Data & Knowledge Engineering. 2013, 86: 276-294. doi:10.1016/j.datak.2013.02.002 0169-023X (Print) 1872-6933 (Online) http://hdl.handle.net/10045/40115 10.1016/j.datak.2013.02.002
Idioma(s)	eng
Publicador	Elsevier
Relação	http://dx.doi.org/10.1016/j.datak.2013.02.002
Direitos	info:eu-repo/semantics/openAccess
Palavras-Chave	#Information retrieval #Noise-tolerance #Restricted domain #Edit distance #Lenguajes y Sistemas Informáticos
Tipo	info:eu-repo/semantics/article

Acesso ao item digital