All four reviewers reached a consensus that the paper passes the acceptance bar of NeurIPS. Despite its incremental nature, the proposed approach achieves strong results in challenging scenarios, compared to previous methods. The AC agrees with the recommendation made by the reviewers. However, as pointed out by R3, the claim that the method doesn’t need manual annotations is inaccurate (given the need for solo videos). The authors need to fix this issue, clearly articulate the limitations of the method, and add the discussion in the rebuttal to the final version of the paper.