Part of Advances in Neural Information Processing Systems 19 (NIPS 2006)
Amir Globerson, Tommi Jaakkola
A number of exact and approximate methods are available for inference calculations in graphical models. Many recent approximate methods for graphs with cycles are based on tractable algorithms for tree structured graphs. Here we base the approximation on a different tractable model, planar graphs with binary variables and pure interaction potentials (no external field). The partition function for such models can be calculated exactly using an algorithm introduced by Fisher and Kasteleyn in the 1960s. We show how such tractable planar models can be used in a decomposition to derive upper bounds on the partition function of non-planar models. The resulting algorithm also allows for the estimation of marginals. We compare our planar decomposition to the tree decomposition method of Wainwright et. al., showing that it results in a much tighter bound on the partition function, improved pairwise marginals, and comparable singleton marginals. Graphical models are a powerful tool for modeling multivariate distributions, and have been successfully applied in various fields such as coding theory and image processing. Applications of graphical models typically involve calculating two types of quantities, namely marginal distributions, and MAP assignments. The evaluation of the model partition function is closely related to calculating marginals [12]. These three problems can rarely be solved exactly in polynomial time, and are provably computationally hard in the general case [1]. When the model conforms to a tree structure, however, all these problems can be solved in polynomial time. This has prompted extensive research into tree based methods. For example, the junction tree method [6] converts a graphical model into a tree by clustering nodes into cliques, such that the graph over cliques is a tree. The resulting maximal clique size (cf. tree width) may nevertheless be prohibitively large. Wainwright et. al. [9, 11] proposed an approximate method based on trees known as tree reweighting (TRW). The TRW approach decomposes the potential vector of a graphical model into a mixture over spanning trees of the model, and then uses convexity arguments to bound various quantities, such as the partition function. One key advantage of this approach is that it provides bounds on partition function value, a property which is not shared by approximations based on Bethe free energies [13]. In this paper we focus on a different class of tractable models: planar graphs. A graph is called planar if it can be drawn in the plane without crossing edges. Works in the 1960s by physicists Fisher [5] and Kasteleyn [7], among others, have shown that the partition function for planar graphs may be calculated in polynomial time. This, however, is true under two key restrictions. One is that the variables xi are binary. The other is that the interaction potential depends only on xi xj (where xi {1}), and not on their individual values (i.e., the zero external field case). Here we show how the above method can be used to obtain upper bounds on the partition function for non-planar graphs. As in TRW, we decompose the potential of a non-planar graph into a sum
over spanning planar models, and then use a convexity argument to obtain an upper bound on the log partition function. The bound optimization is a convex problem, and can be solved in polynomial time. We compare our method with TRW on a planar graph with an external field, and show that it performs favorably with respect to both pairwise marginals and the bound on the partition function, and the two methods give similar results for singleton marginals.
1 Definitions and Notations Given a graph G with n vertices and a set of edges E , we are interested in pairwise Markov Random Fields (MRF) over the graph G. A pairwise MRF [13] is a multivariate distribution over variables x = {x1 , . . . , xn } defined as 1P p(x) = e ijE fij (xi ,xj ) (1) Z where fij are a set of |E | functions, or interaction potentials, defined over pairs of variables. The xP partition function is defined as Z = e ijE fij (xi ,xj ) . Here we will focus on the case where xi {1}. Furthermore, we will be interested in interaction potentials which only depend on agreement or disagreement between the signs of their variables. We define those by 1 ij (1 + xi xj ) = ij I (xi = xj ) (2) 2 so that fij (xi , xj ) is zero if xi = xj and ij if xi = xj . The model is then defined via the set of parameters ij . We use to denote the vector of parameters ij , and denote the partition function by Z ( ) to highlight its dependence on these parameters. f (xi , xj ) = A graph G is defined as planar if it can be drawn in the plane without any intersection of edges [4]. With some abuse of notation, we define E as the set of line segments in 2 corresponding to the edges in the graph. The regions of 2 \ E are defined as the faces of the graph. The face which corresponds to an unbounded region is called the external face. Given a planar graph G, its dual graph G is defined in the following way: the vertices of G correspond to faces of G, and there is an edge between two vertices in G iff the two corresponding faces in G share an edge. If the graph G is weighted, the weight on an edge in G is the weight on the edge shared by the corresponding faces in G. A plane triangulation of a planar graph G is obtained from G by adding edges such that all the faces of the resulting graph have exactly three vertices. Thus a plane triangulated graph has a dual where all vertices have degree three. It can be shown that every plane graph can be plane triangulated [4]. We shall also need the notion of a perfect matching on a graph. A perfect matching on a graph G is defined as a set of edges H E such that every vertex in G has exactly one edge in H incident on it. If the graph is weighted, the weight of the matching is defined as the product of the weights of the edges in the matching. Finally, we recall the definition of a marginal polytope of a graph [12]. Consider an MRF over a graph G where fij are given by Equation 2. Denote the probability of the event I (xi = xj ) under p(x) by ij . The marginal polytope of G, denoted by M(G), is defined as the set of values ij that can be obtained under some assignment to the parameters ij . For a general graph G the polytope M(G) cannot be described using a polynomial number of inequalities. However, for planar graphs, it turns out that a set of O(n3 ) constraints, commonly referred to as triangle inequalities, suffice to describe M(G) (see [3] page 434). The triangle inequalities are defined by 1 TRI(n) = {ij : ij + j k - ik 1, ij + j k + ik 1, i, j, k {1, . . . , n}} (3)
Note that the above inequalities actually contain variables ij which do not correspond to edges in the original graph G. Thus the equality M(G) = TRI(n) should be understood as referring only to the values of ij that correspond to edges in the graph. Importantly, the values of ij for edges not in the graph need not be valid marginals for any MRF. In other words M(G) is a projection of TRI(n) on the set of edges of G. It is well known that the marginal polytope for trees is described via pairwise constraints. It is thus interesting that for planar graphs, it is triplets, rather than pairwise The definition here is slightly different from that in [3], since here we refer to agreement probabilities, whereas [3] refers to disagreement probabilities. This polytope is also referred to as the cut polytope. 1
constraints, that characterize the polytope. In this sense, planar graphs and trees may be viewed as a hierarchy of polytope complexity classes. It remains an interesting problem to characterize other structures in this hierarchy and their related inference algorithms.
2 Exact calculation of partition function using perfect matching The seminal works of Kasteleyn [7] and Fisher [5] have shown how one can calculate the partition function for a binary MRF over a planar graph with pure interaction potentials. We briefly review Fisher's construction, which we will use in what follows. Our interpretation of the method differs somewhat from that of Fisher, but we believe it is more straightforward. The key idea in calculating the partition function is to convert the summation over values of x to the problem of calculating the sum of weights of all perfect matchings in a graph constructed from G, as shown below. In this section, we consider weighted graphs (graphs with numbers assigned to their edges). For the graph G associated with the pairwise MRF, we assign weights wij = e2ij to the edges. The first step in the construction is to plane triangulate the graph G. Let us call the resulting graph GT . We define an MRF on GT by assigning a parameter ij = 0 to the edges that have been added to G, and the corresponding weight wij = 1. Thus GT essentially describes the same distribution as G, and therefore has the same partition function. We can thus restrict our attention to calculating the partition function for the MRF on GT . As a first step in calculating a partition function over GT , we introduce the following definition: a ^ set of edges E in GT is an agreement edge set (or AES) if for every triangle face F in GT one of the ^ ^ following holds: The edges in F are all in E , or exactly one of the edges in F is in E . The weight ^ is defined as the product of the weights of the edges in E . ^ of a set E It can be shown that there exists a bijection between pairs of assignments {x, -x} and agreement edge sets. The mapping from x to an edge set is simply the set of edges such that xi = xj . It is easy to see that this is an agreement edge set. The reverse mapping is obtained by finding an assignment x such that xi = xj iff the corresponding edge is in the agreement edge set. The existence of this mapping can be shown by induction on the number of (triangle) faces. The contribution of a given assignment x to the partition function is e ^ sponds to an AES denoted by E it is easy to see that e P ij E