Inproceedings,

The Impact of Negative Relevance Judgments on NDCG

, , , and .
Proceedings of the 29th ACM International Conference on Information & Knowledge Management, page 2037–2040. New York, NY, USA, Association for Computing Machinery, (2020)
DOI: 10.1145/3340531.3412123

Abstract

NDCG is one of the most commonly used measures to quantify system performance in retrieval experiments. Though originally not considered, graded relevance judgments nowadays frequently include negative labels. Negative relevance labels cause NDCG to be unbounded. This is probably why widely used implementations of NDCG map negative relevance labels to zero, thus ensuring the resulting scores to originate from the 0,1 range. But zeroing negative labels discards valuable relevance information, e.g., by treating spam documents the same as unjudged ones, which are assigned the relevance label of zero by default. We show that, instead of zeroing negative labels, a min-max-normalization of NDCG retains its statistical power while improving its reliability and stability.

Tags

Users

  • @scadsfct

Comments and Reviews