Active learning in keyword search-based data integration


Autoria(s): Yan, Zhepeng; Zheng, Nan; Ives, Zachary G; Talukdar, Partha Pratim; Yu, Cong
Data(s)

2015

Resumo

The problem of scaling up data integration, such that new sources can be quickly utilized as they are discovered, remains elusive: Global schemas for integrated data are difficult to develop and expand, and schema and record matching techniques are limited by the fact that data and metadata are often under-specified and must be disambiguated by data experts. One promising approach is to avoid using a global schema, and instead to develop keyword search-based data integration-where the system lazily discovers associations enabling it to join together matches to keywords, and return ranked results. The user is expected to understand the data domain and provide feedback about answers' quality. The system generalizes such feedback to learn how to correctly integrate data. A major open challenge is that under this model, the user only sees and offers feedback on a few ``top-'' results: This result set must be carefully selected to include answers of high relevance and answers that are highly informative when feedback is given on them. Existing systems merely focus on predicting relevance, by composing the scores of various schema and record matching algorithms. In this paper, we show how to predict the uncertainty associated with a query result's score, as well as how informative feedback is on a given result. We build upon these foundations to develop an active learning approach to keyword search-based data integration, and we validate the effectiveness of our solution over real data from several very different domains.

Formato

application/pdf

Identificador

http://eprints.iisc.ernet.in/52549/1/VLDB_Jou_24-5_611_2015.pdf

Yan, Zhepeng and Zheng, Nan and Ives, Zachary G and Talukdar, Partha Pratim and Yu, Cong (2015) Active learning in keyword search-based data integration. In: VLDB JOURNAL, 24 (5, SI). pp. 611-631.

Publicador

SPRINGER

Relação

http://dx.doi.org/10.1007/s00778-014-0374-x

http://eprints.iisc.ernet.in/52549/

Palavras-Chave #Supercomputer Education & Research Centre
Tipo

Journal Article

PeerReviewed