Publications

Md Rashad Al Hasan Rony, Liubov Kovriguina, Debanjan Chaudhuri, Ricardo Usbeck, and Jens Lehmann. RoMe: A Robust Metric for Evaluating Natural Language Generation. In Smaranda Muresan, Preslav Nakov, and Aline Villavicencio (Eds.), Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 5645--5657, Association for Computational Linguistics, Dublin, Ireland, May 2022. [PUMA: Evaluating Generation Language Metric Natural Robust Xack] URL

Yangjun Zhang, Pengjie Ren, and Maarten de Rijke. A human-machine collaborative framework for evaluating malevolence in dialogues. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 5612--5623, 2021. [PUMA: Zno collaborative evaluating framework human-machine malevolence]

Alexandra Sasha Luccioni, Christopher Akiki, Margaret Mitchell, and Yacine Jernite. Stable bias: evaluating societal representations in diffusion models. Proceedings of the 37th International Conference on Neural Information Processing Systems, Curran Associates Inc., Red Hook, NY, USA, 2024. [PUMA: Stable bias diffusion evaluating imported models representations societal yaff]