An integer linear programming approach for approximate string comparison


Autoria(s): RITT, Marcus; COSTA, Alysson M.; MERGEN, Sergio; ORENGO, Viviane M.
Contribuinte(s)

UNIVERSIDADE DE SÃO PAULO

Data(s)

20/10/2012

20/10/2012

2009

Resumo

We introduce a problem called maximum common characters in blocks (MCCB), which arises in applications of approximate string comparison, particularly in the unification of possibly erroneous textual data coming from different sources. We show that this problem is NP-complete, but can nevertheless be solved satisfactorily using integer linear programming for instances of practical interest. Two integer linear formulations are proposed and compared in terms of their linear relaxations. We also compare the results of the approximate matching with other known measures such as the Levenshtein (edit) distance. (C) 2008 Elsevier B.V. All rights reserved.

Brazilian Ministry of Education

Ministério da Educação do Brasil (MEC)

CAPES

Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)

Identificador

EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, v.198, n.3, p.706-714, 2009

0377-2217

http://producao.usp.br/handle/BDPI/28938

10.1016/j.ejor.2008.10.013

http://dx.doi.org/10.1016/j.ejor.2008.10.013

Idioma(s)

eng

Publicador

ELSEVIER SCIENCE BV

Relação

European Journal of Operational Research

Direitos

restrictedAccess

Copyright ELSEVIER SCIENCE BV

Palavras-Chave #Approximate string matching #Distance metric #Block edits #Block moves #Integer programming #Hop constraints #Management #Operations Research & Management Science
Tipo

article

original article

publishedVersion