NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Reviewer 1
This paper solves the task of knowledge base completion i.e. filling the missing relations between two entities by combining Statistical Relational Model like Markov Logic, and knowledge graph embedding method like TransE. Authors define a set of rules to be used in MLNs and then define a joint probability distribution over the observed and hidden triplets. Similarly, they define a joint probability distribution using KGE approaches (specifically they chose transE model). Then they employ the variational EM algorithm to learn the MLN weights and finally predicting the probabilities of hidden triplets. Originality: I really liked the paper, and enjoyed thoroughly reading it. Although people have looked into this idea of combining rule-based and KGE approaches in several papers, the idea of combining a pure SRL model like Markov Logic is new AFAIK. Quality: The paper has an adequate amount of theory, complemented well with the experiments on well-known 4 datasets. The ablation study, both in terms of KGE models and rules of the MLN shows the effectiveness of their approach. However, I have the following issues: (1) they haven't provided details about the rules of MLNs they have created. In the code also, I couldn't find the details about the exact MLN rules they created. For example, for what relations did they create formulas of MLNs?, (2) They have learned MLN weights using a vanilla gradient descent, however almost all the current approaches using MLN use a much better method like prescaled conjugate gradient descent or LBFGS in case of pseudologlikelihood. Do you have a specific reason to not choose those methods? (3) Moreover, they have yet to compare their method against the SOTA method like RotatE. Clarity: The paper is written very clearly, and necessary theorems and proofs have been provided. Significance: This is a significant work, but the lack of comparison against the SOTA method makes it a weak accept. Some typos: Line 143: Varational -> Variational Eq 6: should be p_w instead of p UPDATE: I have read the author's feedback and am convinced with their response. I vote for accepting this work for the poster presentation.
Reviewer 2
This paper shows introduces a way to combine a markov (logic) network with knowledge graph embeddings. In particular, the approach uses EM to train the weights of a Markov Logic Network in the M-step while inferring latent triple states using a KG embedding model as variational distribution in the E-step. Results on various standard benchmarks are convincing. I think this paper is well written and relatively clear. The idea is straightforward but in a good way (the kind of thing I thought people would have tried much earlier but haven't). The results are convincing and reasonably ablated. There are various approximations/heuristics that the authors use to make this tractable (e.g. sampling the markov blanket before calculating the expectation). These have fewer theoretical groundings but the empirical results justify them. The paper could do a better job in discussing recent related work such as "Learning Explanatory Rules from Noisy Data", "End-to-End Differentiable Proving" and "Adversarial Sets for Regularising Neural Link Predictors" that are either related or very related (the last one for example).
Reviewer 3
The paper is clearly written. I find the variational EM training of MLNs to be interesting. In the E-step, it cleverly uses graph embeddings to sample the truth values of hidden triplets, thereby completely defining the Markov blanket of each triplet for the M-step. This circumvents the need for (intractable) inference to obtain the values of the hidden triplets. However, I do not find the M-step to be novel because the maximization of pseudolikelihood for MLNs is standard fare (e.g., [20]). I have two misgivings about the paper, both of which relate to recent hybrid approaches RUGE [15] and NNE-AER [9]. First, the paper does not position its proposed system clearly vis-a-vis RUGE and NNE-AER. Both systems combine knowledge graph embeddings with first-order logic, and their logical rules are soft and hence capture their inherent uncertainty. The rules are also encapsulated in a maximization equation in a principled manner. Hence, both systems seem to integrate soft logical rules with graph embeddings in as principled a manner as the paper's system. This begs the question of why the paper's system does better than RUGE and NNE-AER in the experiments. Second, I do not think the experimental comparison with RUGE and NNE-AER are truly apples-to-apples. The empirical numbers are taken from the RUGE and NNE-AER papers. In those papers, NNE-AER only considers "non-negativity" rules and "approximate entailment" rules, and RUGE only considers Horn clauses of length at most 2. In contrast, this paper considers four rule types: composition rules, inverse rules, symmetric rules, and subrelation rules. Hence, it is possible that the paper is doing better than RUGE and NNE-AER simply because it considers a different (and possibly bigger) universe of rules, rather than through a more principled combination of soft logic and graph embeddings. UPDATE: In their feedback, the authors have sufficiently addressed my two misgivings with regards to RUGE and NNE-AER. Hence, I will upgrade my score to "marginally above acceptance".