NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Reviewer 1
* This study is solving an important problem in the area of recommendation systems, namely, candidate set generation * The idea is benchmarked against popular metrics (precision, recall, f-measure) over two public datasets, with offline and online experiments. * The baselines benchmarked against are also popular and standard approaches (DNN, ITEM-CF). The results presented for this work beats the benchmarks by a good deal, and in particular, the online test results are very good. * I am not sure if the novelty of this paper is suitable for NeurIPS. It is an incremental improvement to an existing model (TDM) by doing an additional optimization step. The resulting improvement is impressive, though, and it feels like this would be more applicable to an applied data science conference such as KDD or WWW. * There are portions of the paper that are difficult to follow, especially in the explanation of the Joint Optimization Framework. The explanation of TDM in Section 2.1 is helpful, but it would be even more helpful to have a direct comparison between the tree building steps between TDM and the new proposed method. For example, having a side-by-side comparison of Algorithms1 & 2 with its TDM predecessor would go a long way in understanding detailed differences. * Moving to iterative joint learning would likely introduce challenges for training time (or potentially challenging of infrastructure for supporting optimizing tree hierarchy). It would be good if the authors could provide some tradeoff or comparison there. Maybe that could explain why the paper did not mention whether it has been deployed to production (although online A/B results were shown). * It would be good to talk about some of the more practical aspects, such as how many levels of the tree is chosen and how sensitive is the algorithm to these kinds of parameters? * Figure 2: it seems that the Clustering algorithm outperforms JTM in the first few iterations, so would be curious about the intuitive explanation why that’s the case. * Although the paper mentioned that an A/B experiment were performed to evaluate on CTR & RPM (no statistical significance reported), but no where in the paper mentioned whether the method were finally deployed to the full production system. It would be good to have clarity on this in the paper.
Reviewer 2
The paper proposes a joint model to simultaneously learn item tree index and user representation that support efficient retrieval. The tree learning algorithm is based on the maximum weight matching in the bipartite graph for which an approximate algorithm is proposed. Authors further propose to iteratively leverage the tree index to infer a hierarchical user representation that is shown to improve recommendation accuracy. The paper is well written and addresses an important problem in collaborative filtering where fast retrieval becomes critically important at scale. I think the proposed approach is sound and I particularly like the hierarchical tree-based user presentation. It is quite different than most of the proposed approaches that are either neighbour-based or use inner product latent representations. The experimental section provides a thorough comparison using large public datasets. It is also useful to see that the method performs well in a live production setting. Here, I would have also liked to see training and inference times given that even with the proposed approximations Algorithm 2 can be expensive. I think the authors need to quantify this overhead and compare it to other approaches, particularly since the method is targeted towards very large datasets.
Reviewer 3
Originality: Authors propose to extend the tree-based deep model (TDM) for recommendations by jointly optimizing the tree structure. The idea of joint learning itself is somewhat straightforward. The proposed algorithm is also somewhat straightforward, but it has the advantage of being simple and clear, which will allow future research to easily build upon it. Quality: The contribution of this paper is mostly empirical, and the performance of the proposed approach is impressive in both online and offline evaluations. Authors do a great job in making sure relevant baselines are included in the offline analysis, which allows us to measure the impact of each design choice (for ex: joint learning vs. only learning deep NN parameters). Offline experiments are conducted on public datasets, which shall be replicated in the future. Clarity: The paper is clearly written. Authors provide sufficient literature survey and the paper is mostly self-contained. However, it could've been nicer if the network structure of the proposed model was described. Significance: The proposed method is a straightforward but clearly useful extension of TDM. It provides significant improvements over TDM or other strong baseline models, and therefore has a good chance of being widely used in large-scale recommender systems.