Control Variate Diagnostics for Detecting Problems in Logged Bandit Feedback

Abstract

We propose diagnostics, based on control variates, to detect data quality issues in logged bandit feedback data, which is of critical importance for accurate offline evaluation and training of recommendation policies. Our diagnostics can provably detect two common types of data issues: (1) when the policy that logged the data was insufficiently randomized; (2) when the logged propensity values are incorrect due to downstream filtering. We establish bounds on the false positive and false negative rates of our diagnostics, then empirically validate our approach on synthetic data.

Publication
CONSEQUENCES+REVEAL Workshop – RecSys