Reviews: Learning Conditional Deformable Templates with Convolutional Networks

Originality: I had a difficult time evaluating the innovation of their method. In terms of image classification, it is not very novel. Austerweil and Griffiths (NeurIPS 2010) proposed a similar idea, albeit in a Bayesian nonparametric framework (See for faster inference method Hu et al., 2012 ICML: https://arxiv.org/pdf/1206.6482.pdf). This is just within the Bayesian nonparametrics community, and so it is likely that more sophisticated methods for learning templates and transformations simultaneously exist in the computer vision literature. Their description of relevant previous work on neural image registration methods was too sparse (Lines 62-84) to provide enough information for me to evaluate. Quality: The framework and methods are sensible and well-executed (although I was disappointed with the authors' answers to the Reproducibility Checklist that the authors were not committed to posting their code and data with a final paper). Their evaluation method was a bit weak because there were no comparisons to other methods that they mentioned in the previous work section. Even if it were only feasible for a small subset because it was slow and/or the other work performed better, that would still improve the submission by providing more context for evaluating the method and contribution. Clarity: The paper is very well-written with clear descriptions of their data, analyses, and architecture. Significance: I'm not sure if this would be of interest to researchers beyond the neuroimaging community. It is hard for me to evaluate its significance due to the reasons mentioned above. Response to Author Feedback: Thank you for your thoughtful response to my and the other reviewer criticism. My main concern is that if this is a manuscript focusing on neuroimaging analysis and not meant to be compared or given a proper motivation from the perception/image registration literatures, then why spend so much of the manuscript focusing on it rather than neuroimaging analyses? If it is strong enough for NeurIPS on the neuroimaging work alone, then that would be sufficient for me. If the authors wanted to illustrate its potential, then I believe they should address other solutions to the template deformation problem and why they shouldn't be applied to neuroimaging (instead of their own approach). I do not believe the contributions outside of neuroimaging are strong enough for members of those related communities to be a low-medium or greater significance. Additionally, illustrating model comparisons in the main submission would greatly strengthen it. I had put that as an improvement that would potentially increase my score, but it was left unaddressed in their response.

Reviewer 2

Overall * this presented idea is interesting and well-suited for NIPS * the methods are well explained in general * the experiments are serious Little information is given on training parameters and on the neural networks architecture, hence it is hard to judge how strong and efficient the solution is. major ----- Having multiple templates calls for a (difficult) model selection step. Also, it is likely that conditional templates are lower quality than unconditional ones: this should be investigated and documented. Overall I don't feel very enthusiastic about the idea about multiple templates. Hyperparameters: gamma, lambda_d, lambda_a are apparently set arbitrarily: this is bad and typically limits the use of the method. Fig. 6 is bad quality and labels are not readable It is a pity that examples on digits are far more developed than experiments on T1 images. I would have been much more interested by a serious experiment on brain scans, involving e.g. age prediction. minor ----- Awkward to denote A the Laplacian of the graph, as a stands for the template parameter. l.30 "If the template does not adequately capture the dataset variability" but the variability is captured by the deformation model rather than by the template ? l.59 I feel uncomfortable with assertions such as "For example, in studying disease impact, it is helpful to register scans to age-specific 60 templates rather than one covering a wide age range." is this true ? I think that this really depends on what your question is.

Reviewer 3

Training both the templates and the deformations of a registration procedure end-to-end is novel and very useful. The loss function has been thoroughly explained. The explanation for the deformation prior p(V) could be more clearly motivated. It is not clear to me how deformations at different scales are weighted. For example in the case of fMRI registration, I would expect pretty much all images to be rotated and translated as a whole (which should not be given a strong loss), but also locally (which would incur a larger loss). It is not clear to me how these different transformation penalties would be controlled. Perhaps this is regulated by the lambda parameters, but I could not deduce it from the text. I would like to see some more in-between steps of the model. For example, it would be nice to see velocity maps. There is no promise in the paper that source code will be released. It would be helpful if the authors clarified that. Update: I thank the authors for clarifying code release and the whole-image registration step; I'm looking forward to the additional figures to be shown in the supplement. I do not see any additional concerns after the review phase. I still think this is a good paper.

Paper ID:	405
Title:	Learning Conditional Deformable Templates with Convolutional Networks

Reviewer 1

Reviewer 2

Reviewer 3