Inproceedings,

Debiasing Vandalism Detection Models at Wikidata

S. Heindorf, Y. Scholten, G. Engels, and M. Potthast.
The World Wide Web Conference, page 670–680. New York, NY, USA, Association for Computing Machinery, (2019)
DOI: 10.1145/3308558.3313507

Abstract

Crowdsourced knowledge bases like Wikidata suffer from low-quality edits and vandalism, employing machine learning-based approaches to detect both kinds of damage. We reveal that state-of-the-art detection approaches discriminate anonymous and new users: benign edits from these users receive much higher vandalism scores than benign edits from older ones, causing newcomers to abandon the project prematurely. We address this problem for the first time by analyzing and measuring the sources of bias, and by developing a new vandalism detection model that avoids them. Our model FAIR-S reduces the bias ratio of the state-of-the-art vandalism detector WDVD from 310.7 to only 11.9 while maintaining high predictive performance at 0.963 ROC and 0.316 PR.

BibTeX key: 10.1145/3308558.3313507
entry type: inproceedings
address: New York, NY, USA
booktitle: The World Wide Web Conference
year: 2019
pages: 670–680
publisher: Association for Computing Machinery
series: WWW '19
isbn: 9781450366748
numpages: 11
location: San Francisco, CA, USA
DOI: 10.1145/3308558.3313507
url: https://doi.org/10.1145/3308558.3313507

PUMA

Debiasing Vandalism Detection Models at Wikidata

Abstract

Tags

Users

Comments and Reviewsshow / hide

Cite this publication

More citation styles

search on