Inproceedings,

Overview of the cross-domain authorship verification task at PAN 2021

M. Kestemont, E. Manjavacas, I. Markov, J. Bevendorff, M. Wiegmann, E. Stamatatos, B. Stein, and M. Potthast.
CLEF-WN 2021 - Proceedings of the Working Notes of CLEF 2021 - Conference and Labs of the Evaluation Forum, volume 2936 of CEUR Workshop Proceedings, page 1743--1759. CEUR-WS, (2021)2021 Working Notes of CLEF - Conference and Labs of the Evaluation Forum, CLEF-WN 2021 ; Conference date: 21-09-2021 Through 24-09-2021.

Abstract

© 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).Idiosyncrasies in human writing styles make it difficult to develop systems for authorship identification that scale well across individuals. In this year's edition of PAN, the authorship identification track focused on open-set authorship verification, so that systems are applied to unknown documents by previously unseen authors in a new domain. As in the previous year, the sizable materials for this campaign were sampled from English-language fanfiction. The calibration materials handed out to the participants were the same as last year, but a new test set was compiled with authors and fandom domains not present in any of the previous datasets. The general setup of the task did not change, i.e., systems still had to estimate the probability of a pair of documents being authored by the same person. We attracted 13 submissions by 10 international teams, which were compared to three complementary baselines, using five diverse evaluation metrics. Post-hoc analyses show that systems benefitted from the abundant calibration materials and were well-equipped to handle the open-set scenario: Both the top-performing approach and the highly competitive cohort of runner-ups presented surprisingly strong verifiers. We conclude that, at least within this specific text variety, (large-scale) open-set authorship verification is not necessarily or inherently more difficult than a closed-set setup, which offers encouraging perspectives for the future of the field.

BibTeX key: 88168c9e6c494e4fb2afcf71d3217a17
entry type: inproceedings
booktitle: CLEF-WN 2021 - Proceedings of the Working Notes of CLEF 2021 - Conference and Labs of the Evaluation Forum
year: 2021
pages: 1743--1759
publisher: CEUR-WS
series: CEUR Workshop Proceedings
volume: 2936
language: English
note: 2021 Working Notes of CLEF - Conference and Labs of the Evaluation Forum, CLEF-WN 2021 ; Conference date: 21-09-2021 Through 24-09-2021

Users

Comments and Reviewsshow / hide

Please log in to take part in the discussion (add own reviews or comments).

Cite this publication

%0 Conference Paper %1 88168c9e6c494e4fb2afcf71d3217a17 %A Kestemont, M. %A Manjavacas, E. %A Markov, I. %A Bevendorff, J. %A Wiegmann, M. %A Stamatatos, E. %A Stein, B. %A Potthast, M. %B CLEF-WN 2021 - Proceedings of the Working Notes of CLEF 2021 - Conference and Labs of the Evaluation Forum %D 2021 %E Faggioli, G. %E Ferro, N. %E Joly, A. %E Maistro, M. %E Piroi, F. %I CEUR-WS %K %P 1743--1759 %T Overview of the cross-domain authorship verification task at PAN 2021 %V 2936 %X © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).Idiosyncrasies in human writing styles make it difficult to develop systems for authorship identification that scale well across individuals. In this year's edition of PAN, the authorship identification track focused on open-set authorship verification, so that systems are applied to unknown documents by previously unseen authors in a new domain. As in the previous year, the sizable materials for this campaign were sampled from English-language fanfiction. The calibration materials handed out to the participants were the same as last year, but a new test set was compiled with authors and fandom domains not present in any of the previous datasets. The general setup of the task did not change, i.e., systems still had to estimate the probability of a pair of documents being authored by the same person. We attracted 13 submissions by 10 international teams, which were compared to three complementary baselines, using five diverse evaluation metrics. Post-hoc analyses show that systems benefitted from the abundant calibration materials and were well-equipped to handle the open-set scenario: Both the top-performing approach and the highly competitive cohort of runner-ups presented surprisingly strong verifiers. We conclude that, at least within this specific text variety, (large-scale) open-set authorship verification is not necessarily or inherently more difficult than a closed-set setup, which offers encouraging perspectives for the future of the field.

PUMA

Overview of the cross-domain authorship verification task at PAN 2021

Abstract

Tags

Users

Comments and Reviewsshow / hide

Cite this publication

More citation styles

search on