Two new papers at CONSEQUENCES+REVEAL 2022
I will be sharing two new papers – representing some of the work I’ve been doing for Amazon Music – at the CONSEQUENCES+REVEAL Workshop, which is part of the RecSys 2022 conference.
The first paper, “Control Variate Diagnostics for Detecting Problems in Logged Bandit Feedback,” deals with an important, practical problem in off-policy evaluation and learning that has not received much attention in the literature. When using logged bandit feedback (such as data from a recommender system) for offline evaluation, it is important that the data collection policy (a.k.a. logging policy or behavioral policy) is sufficiently randomized, and that the logs accurately reflect the propensities (i.e., probabilities) of the selected actions. These conditions allow for unbiased evaluation of a new policy; when they fail, reward (i.e., utility) estimates may be biased. We therefore need diagnostics to test whether the conditions are met. Toward this goal, we propose diagnostics based on control variates – statistics with known expectations. Framed as a hypothesis test, we evaluate whether it is statistically plausible that the observed data come from a “good” data distribution. We analyze the diagnostics’ false positive and false negative rates, and conduct experiments on synthetic data to empirically validate their effectiveness. I will be giving a talk for this paper on Thursday, 9/22.
The second paper, “Off-Policy Evaluation for Learning-to-Rank via Interpolating the Item-Position Model and the Position-Based Model,” deals with off-policy evaluation of ranking policies. When evaluating ranking policies, one must assume a model of user interaction. Two popular choices are the position-based model (PBM) and the item-position model (IPM). The PBM yields an estimator that typically has low variance, but requires a position bias curve that, when estimated inaccurately, can cause high bias. Meanwhile, the IPM estimator does not require a position bias curve and, under reasonable assumptions, is unbiased – however, at the expense of larger variance. To get the “best of both worlds,” we propose a new estimator, INTERPOL, that interpolates between the two models. We show that this estimator is unbiased whenever the PBM assumptions hold, and our experiments on synthetic data corroborate that INTERPOL can achieve lower estimation error than either the PBM or IPM for a properly chosen interpolation parameter. My collaborator, Alexander Buchholz, will be giving a talk for this paper on Friday, 9/23.