NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Reviewer 1
In this paper the authors analyze the trajectories of variables in ADMM, and make use of these descriptions to propose an adaptation of ADMM. I really liked this paper. The presentation of the paper is ok, in the sense of content and explanations, although there are lots of typos and weird sentences (see some below). In the definition of partial smoothness, you need a function, a point, and a manifold M_x. However in line 92, it reads "Suppose R \in PSF_x(M^R_x)", and the set M^R_x is not defined above (and the same for J) I would have liked to see the trajectories in the experimental section as well, since Fig 1 seems to be for a toy example. Specially for the case not included in the theorems (R and J not polyhedral, but spiral trajectory). Minor comments: Please take a look at this sentence in line 50. "Given an arbitrary [...] converges faster to z*". It's missing a verb. Line 19: "becomes" should be "became", and the use of "owing to" is weird. I would suggest something like "ADDM was first proposed in [13] and has recently gained increasing popularity due to [6]." Line 118: depending the leading eigenvalue -> depending ON the leading eigenvalue Line 125: when it failures -> when it fails Line 128: The above scheme can reformulated -> The above scheme can BE reformulated Line 144: implies -> imply Also take a look at the sentence in line 167: "As E_s,q contains the sum ...". The verb is also missing in this sentence.
Reviewer 2
There are a great many papers written on the ADMM algorithm, and it is often very difficult to see and gauge the novelty of the submission. Thankfully, this paper actually does make a very nice contribution, which is clearly articulated, so I am happy to be able to support this work. The paper is well written and organised. The key observation is highlighted immediately so that the reader can see what the contribution (and therefore value) of the work is. A new algorithm naturally arises, theoretical results are stated to support the observation and algorithm, and there is a good number of numerical experiments to support the paper. I think this paper will be of wide interest to the community.
Reviewer 3
The paper talks about the trajectory of z_k in ADMM under convex setup. Authors prove that, near the local stationary point, v_k = z_k - z_{k-1} equals to a rotation of v_{k-1} plus some smaller order terms. When objective functions are polyhedral, smaller order terms can be dropped off. Moreover, when either of objective functions is smooth and A is full-rank square matrix, the rotation matrix has real eigenvalues for large enough gamma. Under this result, authors show some examples that ADMM with inertial acceleration fails in. In particular, authors study the Lasso problem and show when gamma is small, inertial acceleration does not work out. Motivated from this phenomenon, authors further propose A3DMM to do extrapolation, which is also tested in experiments. This submission has good quality and clarity overall. The analysis is also significant. But I have few concerns: 1, why do we focus on z_k, an intermediate variable of ADMM, instead of psi_k. If similar results can be proved for dual variable, and acceleration step is applied on either x_k (or y_k) or psi_k (which is usually the case), then it makes more sense to me. 2, for the theoretical analysis of z_k, I don't see when direction of z_k will be a straight line. The straight line should correspond to M = I, but this result is not discussed in the paper. 3, for Proposition 2.4, authors claim that if gamma is smaller than threshold, M has complex eigenvalues. but trajectory could still be spiral (or straight line? again, why straight line is not clear). How does this argument predict the behavior of inertial ADMM? I don't see the threshold of gamma distinguishes the behavior of z_k clearly. Also, authors need explain why 2.4 implies straight line for large gamma. 4, It's not clear if the failure of inertial acceleration is due to the z_k-z_{k-1} term from the experiments. Authors should show some iteration trajectories on this term from figures. For many variant ADMM, it's common to have step size parameter before gamma, and analysis will impose conditions on step size. It's not surprising to see inertial ADMM fails for small gamma. To claim the failure is because of direction z_k - z_{k-1}, authors should provide evidence. 5, different from analyzing direction of z_k, it seems that some extra conditions are imposed on M when showing convergence of A3DMM. However, M only has existence argument in previous section. How those conditions on M reduce the function candidates is not very clear. The experiments also ignore the condition on M.