@scadsfct

Debiasing Vandalism Detection Models at Wikidata

, , , and . The World Wide Web Conference, page 670–680. New York, NY, USA, Association for Computing Machinery, (2019)
DOI: 10.1145/3308558.3313507

Abstract

Crowdsourced knowledge bases like Wikidata suffer from low-quality edits and vandalism, employing machine learning-based approaches to detect both kinds of damage. We reveal that state-of-the-art detection approaches discriminate anonymous and new users: benign edits from these users receive much higher vandalism scores than benign edits from older ones, causing newcomers to abandon the project prematurely. We address this problem for the first time by analyzing and measuring the sources of bias, and by developing a new vandalism detection model that avoids them. Our model FAIR-S reduces the bias ratio of the state-of-the-art vandalism detector WDVD from 310.7 to only 11.9 while maintaining high predictive performance at 0.963 ROC and 0.316 PR.

Links and resources

Tags