J. Bevendorff, B. Stein, M. Hagen, and M. Potthast. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), page 654--659. Minneapolis, Minnesota, Association for Computational Linguistics, (June 2019)
DOI: 10.18653/v1/N19-1068
Abstract
Authorship verification is the problem of inferring whether two texts were written by the same author. For this task, unmasking is one of the most robust approaches as of today with the major shortcoming of only being applicable to book-length texts. In this paper, we present a generalized unmasking approach which allows for authorship verification of texts as short as four printed pages with very high precision at an adjustable recall tradeoff. Our generalized approach therefore reduces the required material by orders of magnitude, making unmasking applicable to authorship cases of more practical proportions. The new approach is on par with other state-of-the-art techniques that are optimized for texts of this length: it achieves accuracies of 75--80\%, while also allowing for easy adjustment to forensic scenarios that require higher levels of confidence in the classification.
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)
%0 Conference Paper
%1 bevendorff-etal-2019-generalizing
%A Bevendorff, Janek
%A Stein, Benno
%A Hagen, Matthias
%A Potthast, Martin
%B Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)
%C Minneapolis, Minnesota
%D 2019
%E Burstein, Jill
%E Doran, Christy
%E Solorio, Thamar
%I Association for Computational Linguistics
%K imported
%P 654--659
%R 10.18653/v1/N19-1068
%T Generalizing Unmasking for Short Texts
%U https://aclanthology.org/N19-1068
%X Authorship verification is the problem of inferring whether two texts were written by the same author. For this task, unmasking is one of the most robust approaches as of today with the major shortcoming of only being applicable to book-length texts. In this paper, we present a generalized unmasking approach which allows for authorship verification of texts as short as four printed pages with very high precision at an adjustable recall tradeoff. Our generalized approach therefore reduces the required material by orders of magnitude, making unmasking applicable to authorship cases of more practical proportions. The new approach is on par with other state-of-the-art techniques that are optimized for texts of this length: it achieves accuracies of 75--80\%, while also allowing for easy adjustment to forensic scenarios that require higher levels of confidence in the classification.
@inproceedings{bevendorff-etal-2019-generalizing,
abstract = {Authorship verification is the problem of inferring whether two texts were written by the same author. For this task, unmasking is one of the most robust approaches as of today with the major shortcoming of only being applicable to book-length texts. In this paper, we present a generalized unmasking approach which allows for authorship verification of texts as short as four printed pages with very high precision at an adjustable recall tradeoff. Our generalized approach therefore reduces the required material by orders of magnitude, making unmasking applicable to authorship cases of more practical proportions. The new approach is on par with other state-of-the-art techniques that are optimized for texts of this length: it achieves accuracies of 75{--}80{\%}, while also allowing for easy adjustment to forensic scenarios that require higher levels of confidence in the classification.},
added-at = {2024-10-02T10:38:17.000+0200},
address = {Minneapolis, Minnesota},
author = {Bevendorff, Janek and Stein, Benno and Hagen, Matthias and Potthast, Martin},
biburl = {https://puma.scadsai.uni-leipzig.de/bibtex/28422dbdc387f1914f1d4e87c3aed2eab/scadsfct},
booktitle = {Proceedings of the 2019 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)},
doi = {10.18653/v1/N19-1068},
editor = {Burstein, Jill and Doran, Christy and Solorio, Thamar},
interhash = {c009df86945e1187597b2d8f7bb459bf},
intrahash = {8422dbdc387f1914f1d4e87c3aed2eab},
keywords = {imported},
month = jun,
pages = {654--659},
publisher = {Association for Computational Linguistics},
timestamp = {2024-10-02T10:38:17.000+0200},
title = {Generalizing Unmasking for Short Texts},
url = {https://aclanthology.org/N19-1068},
year = 2019
}