copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Multi-source dataset of e-commerce products with attributes for property matching

D. Ayala, I. Hernández, D. Ruiz, and E. Rahm. Data Brief, 41 (107884): 107884 (April 2022)

Abstract

Schema/ontology matching consists in finding matches between types, properties and entities in heterogeneous sources of data in order to integrate them, which has become increasingly relevant with the development of web technologies and open data initiatives. One of the involved tasks is the matching of data properties, which attempts to try to find correspondences between the attributes of the entities. This is challenging due to the at times different names of equivalent properties. Furthermore, some properties may not be equivalent, but still match in 1..n relationships. These difficulties create the need for varied evaluation datasets for two reasons. First, they are needed to evaluate existing techniques in a variety of scenarios. Second, they enable the training of supervised techniques that may even become context-independent if trained with data from diverse enough contexts. To support the evaluation and training of data property matching techniques, we present a collection dataset consisting of product records from four different contexts. These datasets are the result of transforming two different existing datasets. In one of the datasets, some properties were filtered for being too noisy. The resulting processed dataset consists of json files with a listing of the product records and their properties, and a separate grouping of the properties that determines which ones match. It contains information about 2860 entities, with 4386 properties and 13350 pairwise matches.

Cite this publication

@article{Ayala2022-id, abstract = {Schema/ontology matching consists in finding matches between types, properties and entities in heterogeneous sources of data in order to integrate them, which has become increasingly relevant with the development of web technologies and open data initiatives. One of the involved tasks is the matching of data properties, which attempts to try to find correspondences between the attributes of the entities. This is challenging due to the at times different names of equivalent properties. Furthermore, some properties may not be equivalent, but still match in 1..n relationships. These difficulties create the need for varied evaluation datasets for two reasons. First, they are needed to evaluate existing techniques in a variety of scenarios. Second, they enable the training of supervised techniques that may even become context-independent if trained with data from diverse enough contexts. To support the evaluation and training of data property matching techniques, we present a collection dataset consisting of product records from four different contexts. These datasets are the result of transforming two different existing datasets. In one of the datasets, some properties were filtered for being too noisy. The resulting processed dataset consists of json files with a listing of the product records and their properties, and a separate grouping of the properties that determines which ones match. It contains information about 2860 entities, with 4386 properties and 13350 pairwise matches.}, added-at = {2024-09-10T11:54:51.000+0200}, author = {Ayala, Daniel and Hern{\'a}ndez, Inma and Ruiz, David and Rahm, Erhard}, biburl = {https://puma.scadsai.uni-leipzig.de/bibtex/2966a18c0cfc5564f9841919ef698c134/scadsfct}, copyright = {http://creativecommons.org/licenses/by/4.0/}, interhash = {b09c70792fa7fd35b9715fb94c7ba978}, intrahash = {966a18c0cfc5564f9841919ef698c134}, journal = {Data Brief}, keywords = {Ontology Property area_bigdata data engineering integration matching zno}, language = {en}, month = {04}, number = 107884, pages = 107884, publisher = {Elsevier BV}, timestamp = {2025-07-29T11:50:19.000+0200}, title = {Multi-source dataset of e-commerce products with attributes for property matching}, volume = 41, year = 2022 }

PUMA

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Multi-source dataset of e-commerce products with attributes for property matching

Abstract

Links and resources

Tags

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews
(0)

PUMA

copydeleteadd this publication to your clipboardcommunity posthistory of this postURLDOIBibTeXEndNoteAPAChicagoDIN 1505HarvardMSOffice XML Multi-source dataset of e-commerce products with attributes for property matching

Abstract

Links and resources

Tags

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews (0)

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Multi-source dataset of e-commerce products with attributes for property matching

Comments and Reviews
(0)