Self-Normalized Off-Policy Estimators for Ranking

Abstract

We propose two new estimators for off-policy evaluation of ranking policies, based on the idea of self-normalization. Importantly, these estimators are parameter-free and asymptotically unbiased. Experiments with synthetic data demonstrate that our estimators can be more accurate than other importance weighting estimators, owing to their ability to control variance, while adding minimal bias. From this, we conclude that self-normalization offers an optimal balance of accuracy and practicality for off-policy ranker evaluation.

Publication
CONSEQUENCES Workshop – RecSys