Abstract
NDCG is one of the most commonly used measures to quantify system performance in retrieval experiments. Though originally not considered, graded relevance judgments nowadays frequently include negative labels. Negative relevance labels cause NDCG to be unbounded. This is probably why widely used implementations of NDCG map negative relevance labels to zero, thus ensuring the resulting scores to originate from the 0,1 range. But zeroing negative labels discards valuable relevance information, e.g., by treating spam documents the same as unjudged ones, which are assigned the relevance label of zero by default. We show that, instead of zeroing negative labels, a min-max-normalization of NDCG retains its statistical power while improving its reliability and stability.
Users
Please
log in to take part in the discussion (add own reviews or comments).