Glad you liked the post! Yes A/B testing ML models with (user interaction) feedback loops are a great example of abstraction leakage. Thanks for bringing this up!
Fundamentally, this problem seems to have no “perfect” solution. Either we ignore the leak and allow one model to learn from the exploration of the other, or we isolate the feedback loops and halve the size of the training data. Either way, model performance during the test might not be representative of model performance once the test ends.
It has been a long time since I’ve personally looked into this topic, but I know there are several teams at Booking using different approaches to account for this leakage. Depending on the details of the application, different solutions might be preferable.