NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Paper ID:1475
Title:On the Calibration of Multiclass Classification with Rejection


		
Given the disparity in the reviewers' scores and the authors' confidential comments to the AC, I looked at the paper carefully myself. My impressions of the paper's main contributions are the following: 1. Unified study of confidence-based and rejection-based methods for multiclass classification with a reject option. (Significance: Medium-high) 2. Sufficient and necessary conditions for calibration of rejection-based methods, together with a demonstration that two natural rejection-based surrogates (APC-exp, MPC-exp) do not satisfy the necessary condition and therefore are not calibrated. (Significance: Medium) 3. Derivation of excess risk bounds for confidence-based methods based on class probability estimation. (Significance: Low) 4. Experimental comparisons of the two classes of approaches. (Significance: Medium) In light of this, I would like to recommend a (weak) accept, with the following conditions: 1. Since the paper establishes calibration failure of only the MPC-exp and APC-exp rejection-based surrogates -- and not of all rejection-based surrogates in general -- the authors must re-word claims throughout the paper accordingly, making clear that it remains an open question whether there might be other rejection-based surrogates that could be calibrated (their current results do not appear to conclusively rule out this possibility). I would also encourage them to see if they can establish failure of the MPC-log and APC-log surrogates which are used in their experiments. 2. The authors must make corrections to their citations to Ramaswamy et al. [20] as follows: (a) p.1: "...empirical performance is not convincing" -- needs to be more precise as to what is not convincing (or change language); (b) p.3: "...defined calibration in this problem as follows" -- as far as I remember there is no such definition in Ramaswamy et al. [20]. 3. The authors must include important experimental details such as number of classes K in the main text. All the above changes should be relatively easy to implement. I note that Reviewer 2 recommended rejecting the paper as some of the main claims are currently not stated accurately. However I believe that with the above changes (all of which should be relatively easy to implement), the claims will be accurately supported and the results will be of value to the NeurIPS community; this is why I am recommending a (weak) accept. I discussed this proposal with the reviewers and they did not object to it. I also note that if the paper is indeed accepted, then it is extremely important that the authors implement the changes recommended above.