@scadsfct

Generalizing Unmasking for Short Texts

, , , and . Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), page 654--659. Minneapolis, Minnesota, Association for Computational Linguistics, (June 2019)
DOI: 10.18653/v1/N19-1068

Abstract

Authorship verification is the problem of inferring whether two texts were written by the same author. For this task, unmasking is one of the most robust approaches as of today with the major shortcoming of only being applicable to book-length texts. In this paper, we present a generalized unmasking approach which allows for authorship verification of texts as short as four printed pages with very high precision at an adjustable recall tradeoff. Our generalized approach therefore reduces the required material by orders of magnitude, making unmasking applicable to authorship cases of more practical proportions. The new approach is on par with other state-of-the-art techniques that are optimized for texts of this length: it achieves accuracies of 75--80\%, while also allowing for easy adjustment to forensic scenarios that require higher levels of confidence in the classification.

Links and resources

Tags