Top-k entity augmentation using consistent set covering
J. Eberius, M. Thiele, K. Braunschweig, and W. Lehner. Proceedings of the 27th International Conference on Scientific and Statistical Database Management, New York, NY, USA, ACM, (June 2015)
Abstract
Entity augmentation is a query type in which, given a set of entities and a large corpus of possible data sources, the values of a missing attribute are to be retrieved. State of the art methods return a single result that, to cover all queried entities, is fused from a potentially large set of data sources. We argue that queries on large corpora of heterogeneous sources using information retrieval and automatic schema matching methods can not easily return a single result that the user can trust, especially if the result is composed from a large number of sources that user has to verify manually. We therefore propose to process these queries in a Top-k fashion, in which the system produces multiple minimal consistent solutions from which the user can choose to resolve the uncertainty of the data sources and methods used. In this paper, we introduce and formalize the problem of consistent, multi-solution set covering, and present algorithms based on a greedy and a genetic optimization approach. We then apply these algorithms to Web table-based entity augmentation. The publication further includes a Web table corpus with 100M tables, and a Web table retrieval and matching system in which these algorithms are implemented. Our experiments show that the consistency and minimality of the augmentation results can be improved using our set covering approach, without loss of precision or coverage and while producing multiple alternative query results.
%0 Conference Paper
%1 Eberius2015-ck
%A Eberius, Julian
%A Thiele, Maik
%A Braunschweig, Katrin
%A Lehner, Wolfgang
%B Proceedings of the 27th International Conference on Scientific and Statistical Database Management
%C New York, NY, USA
%D 2015
%I ACM
%K imported
%T Top-k entity augmentation using consistent set covering
%X Entity augmentation is a query type in which, given a set of entities and a large corpus of possible data sources, the values of a missing attribute are to be retrieved. State of the art methods return a single result that, to cover all queried entities, is fused from a potentially large set of data sources. We argue that queries on large corpora of heterogeneous sources using information retrieval and automatic schema matching methods can not easily return a single result that the user can trust, especially if the result is composed from a large number of sources that user has to verify manually. We therefore propose to process these queries in a Top-k fashion, in which the system produces multiple minimal consistent solutions from which the user can choose to resolve the uncertainty of the data sources and methods used. In this paper, we introduce and formalize the problem of consistent, multi-solution set covering, and present algorithms based on a greedy and a genetic optimization approach. We then apply these algorithms to Web table-based entity augmentation. The publication further includes a Web table corpus with 100M tables, and a Web table retrieval and matching system in which these algorithms are implemented. Our experiments show that the consistency and minimality of the augmentation results can be improved using our set covering approach, without loss of precision or coverage and while producing multiple alternative query results.
@inproceedings{Eberius2015-ck,
abstract = {Entity augmentation is a query type in which, given a set of entities and a large corpus of possible data sources, the values of a missing attribute are to be retrieved. State of the art methods return a single result that, to cover all queried entities, is fused from a potentially large set of data sources. We argue that queries on large corpora of heterogeneous sources using information retrieval and automatic schema matching methods can not easily return a single result that the user can trust, especially if the result is composed from a large number of sources that user has to verify manually. We therefore propose to process these queries in a Top-k fashion, in which the system produces multiple minimal consistent solutions from which the user can choose to resolve the uncertainty of the data sources and methods used. In this paper, we introduce and formalize the problem of consistent, multi-solution set covering, and present algorithms based on a greedy and a genetic optimization approach. We then apply these algorithms to Web table-based entity augmentation. The publication further includes a Web table corpus with 100M tables, and a Web table retrieval and matching system in which these algorithms are implemented. Our experiments show that the consistency and minimality of the augmentation results can be improved using our set covering approach, without loss of precision or coverage and while producing multiple alternative query results.},
added-at = {2024-10-02T10:38:17.000+0200},
address = {New York, NY, USA},
author = {Eberius, Julian and Thiele, Maik and Braunschweig, Katrin and Lehner, Wolfgang},
biburl = {https://puma.scadsai.uni-leipzig.de/bibtex/2c6adf56eea2b46a5f394d6423cea8378/scadsfct},
booktitle = {Proceedings of the 27th International Conference on Scientific and Statistical Database Management},
conference = {SSDBM 2015: International Conference on Scientific and Statistical Database Management},
copyright = {http://www.acm.org/publications/policies/copyright\_policy\#Background},
interhash = {f79c534f79f0a7d5842ca93be25b32f1},
intrahash = {c6adf56eea2b46a5f394d6423cea8378},
keywords = {imported},
location = {La Jolla California},
month = jun,
publisher = {ACM},
timestamp = {2024-10-02T10:38:17.000+0200},
title = {Top-k entity augmentation using consistent set covering},
year = 2015
}