Contextual Position Bias Estimation Using a Single Stochastic Logging Policy

Abstract

Addressing the position bias is of pivotal importance for performing unbiased off-policy training and evaluation in Learning To Rank (LTR). This requires accurate estimates of the probabilities of the users examining the slots where items are displayed, which in many applications is likely to depend on multiple factors, e.g. the screen size. This leads to a position-bias curve that is no longer constant, but depends on the context. Existing position-bias estimators are either non-contextual or require multiple deployed ranking policies. We propose a novel contextual position-bias estimator that only requires propensities logged from a single stochastic logging policy. Empirical evaluations assess the accuracy of the model in recovering the position-bias curves as well as the impact on off-policy evaluation, showing how a contextual position-bias estimator can deliver better reward estimates which are more robust to non-stationarity compared to a non-contextual one.

Publication
LERI Workshop – RecSys