NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Reviewer 1
Bases on the optimal transportation theory, the authors have developed a new approach to tackle the multiple marginal matching problem. The authors have provided details to derive tractable objective function, which can be formulated as a GAN problem eventually. Theoretical analysis on the generalization of the method has been conducted. Experiments on both toy and real-world data are helpful to justify the effectiveness of the method. In Section 4.1, the authors linked potential functions in multiple domains using a uniform potential function with different weights. This assumption seems to be a bit strong. Some more explanations are necessary to justify the reasonability of this formulation. In Problem IV, the authors state that 1/N can be taken as a default value for lambda. If so, the problem will be nearly the same with the objective function of classical WGAN. Further considering the generators for different domain, the whole process of the algorithm may not show significant difference from WGAN, except that more than one generator has been used. In addition, the correlation between multiple domains seems to be investigated in a very straightforward way by assuming that a shared discriminator (in WGAN) for multiple domains. It is therefore unclear whether this simply approach indeed is helpful for exploring the correlation. It is very interesting to discuss the generalization in the paper. According to definition 1, the generalization is defined over training sample. How about the generalization over unseen test examples? ------------------------------------------------------------------------------ The authors addressed my concerns on the technical details in the rebuttal. The proposed algorithm is theoretically motivated and has shown performance advantages in experiments. This paper is interesting to me, and I would like to vote for an acceptance.
Reviewer 2
Multi Marginal Wasserstein GAN's goal is to match a source domain distribution to multiple target domain distributions. While dedicated GAN frameworks exist, as noted by the authors (CycleGAN, StarGAN, …), their generated samples suffer from blurriness, especially when resulting from multiple targets. Moreover, the main statement of this work is that MWGAN is theoretically motivated, unlike previous works. Computing the multi marginal Wasserstein distance between several domains is intractable in its primal form. Thus, as proposed in WGAN, the authors express their problem in the dual form, resulting in equation 1. Since, the dual formulation is a maximization problem under infinite constraint, which remains intractable, the authors simplify the problem by only considering it on its empirical version (equation 2). Eventually, the authors argue that if the potential functions can be expressed by a unique function up to a multiplicative constant, then they can simplify Problem III, that they finally enounce in Problem IV as the Multi Marginal Wasserstein GAN (MWGAN). When it comes to the training of MWGAN, the framework requires two additional terms as presented in Algorithm 1: -inner domain constraints: a classifier to constrain generator number i to sample from the i-th domain -inter-domain constraints: when the generators have not converged yet, it may jeopardize the training to constraint the generated samples to follow the inequality constraints or Problem III. Then the authors propose a softer version that balances the loss function. Something that is not clear is that it seems that the authors claim to solve the multi marginal Wasserstein distance, which is theoretically wrong as they make hard approximations on the family of potential functions, which may mislead the reader. Moreover, there is no discussion on cases where this approximation may be true or any discussions on the tightness of this approximation. Nevertheless, the authors rely on the section Theoretical Discussion, to promote the generalization ability of their method with enough training data. Again, I am not sure that those bounds are true for their approximation and I would appreciate some clarifications. When it comes to the experimental sections, results have been conducted thoroughly on several datasets used two criterions FID (which is not useful in the multi-target transfer) and a classifier trained on the ground truth data to recognize the domains. An AMT perceptual evaluation as conducted by StarGAN would have been interested also. Eventually, the results and their illustrations are promising. I would be curious about the composite generator: when applying multiple attributes, how do you pick the order of compositions (I expect it not to commute). In a Nutshell here is the list of the main pros and cons of this work: - pros: The experiments are state of the art with high quality generated samples for multiple attributes - cons: the authors overly claim their theoretical guarantees thanks to the multi marginal Wasserstein distance, without any analyses of the tightness of their approximations on the potential functions.
Reviewer 3
Originality:This paper solves the multiple marginal mapping problem by defining a multi-marginal Wasserstein algorithm firstly. Quality: The whole structure of this work is consistent in general. Under a specific condition, the paper gives the sound technically analysis, contains the equivalence of solutions, and the generalization analysis. And the theoretical analysis and the empirical experiment results to support for the proposed algorithms. Clarity:The paper is written clearly and easy to follow. Significance: This work makes a moderate advance for M3 problem. Under a specific and rigorous condition, the authors done an adequate work theoretically and experimentally. The only problem is that how can real problems satisfy the condition. Some concerns: 1)The whole work stand on a condition that a shared potential function is sufficient for problem 1 in paper. But authors just use the experimental result in appendix I to show the practicability in some real-world tasks. It seems weak. 2)In theorem 1, there is a key assumption which states “if (f_0, \cdots, f_N) and (\lambda_0f , \cdots, \labmbda_N f ) are solutions to problem 1”. How can you verify this assumption? The key question I want to known is, why you can replace N different functions with a unified function f and N constant factor? Or within what distance among the multiple domain, you can do such replace. 3)Toy data experiment. Can you explain more detailly about how to understand the Figure2 especially the value surface of the discriminator.