Submitted by
Assigned_Reviewer_4
Q1: Comments to author(s).
First provide a summary of the paper, and then address the following
criteria: Quality, clarity, originality and significance. (For detailed
reviewing guidelines, see
http://nips.cc/PaperInformation/ReviewerInstructions)
The paper proposes an estimation strategy for
recovering the parameters of a finite state Markov chain given observed
stationary frequencies of some states. (The paper also considers using
additional hitting frequencies for a few state pairs, but this is not used
in the final evaluation.)
Although the problem proposed is
potentially very interesting, the paper does not appear to be in a mature
state. Some fundamental issues are not adequately addressed, while the
evaluation is limited, and the writing quality is not strong.
Note
that there is an uncountable set of ergodic transition models that can
exactly match a given subset of stationary frequencies when the number of
observed stationary state frequencies is small relative to the total
number of states. In short, the problem is ill posed. This means that
additional prior knowledge is required to recover a unique transition
model, whether via regularization or imposing a compact parametric form.
Unfortunately, the paper does not even discuss the necessity of
imposing prior knowledge to achieve non-vacuous results in this setting.
That is, even though the paper proposes a parametric representation and
considers regularization, it does not observe that the problem is ill
posed, hence that *zero* loss is obtainable for an uncountable set of
transition models. Instead, the paper merely proposes a few heuristic loss
functions and an arbitrary regularizer. This seems to be a somewhat
superficial approach to the issue. Even the paper's main motivating
example of a traffic network possesses clear prior structure: the
transition probability must be zero between unconnected links. There will
be some subsets of stationary state probabilities that can identify the
transition model in such a structured setting, but this possibility is not
even discussed.
The proposed loss functions (8) and (9) are
clearly heuristic, and I found the attempted justification below (8) to be
unconvincing. A distributional assumption can always be reverse engineered
from a loss formulation like (8), but that does not mean the
log-frequencies will have this distributional form.
The discussion
of "bias" in Section 3.2 appears to be too informal: on the context of
estimation, unbiasedness normally refers to a property that the expected
value of an estimator is the true value of the underlying parameter.
Although the loss formulation (8) is claimed to be "unbiased", no argument
is given that the expected value of the minimizer of (8) has the correct
value.
If one accepts the regularized loss formulation, the
natural gradient approach developed in Section 4 appears to be reasonable.
The technical development in this section is sound.
The
experimental evaluation appears to be limited. One synthetic and one
"real" experiment are conducted.
In the synthetic experiment, the
model exactly matches that used by the paper's proposed estimator, which
is an ideal scenario that cannot be guaranteed in practice. Here the
proposed method's results are compared to a baseline estimator (NWKR) that
does not exploit the fact that the target stationary frequencies are
associated with a Markov chain. This seems to be a weak straw-man
competitor, and it is not too surprising that it performs poorly by
comparison.
In the real experiment, it is somewhat disappointing
that hitting rate observations are not included in the observations, which
means that the initial distribution is irrelevant to the estimation
problem (since only stationary frequencies are being used). Nevertheless,
this is an interesting experimental domain. The results look reasonable,
although again, the baseline competitor is too weak to take very
seriously. A more credible competitor would identify a transition model
that exactly matches the observed stationary frequencies (and the sparsity
pattern implied by the road network) while maximizing entropy. The
experimental evaluation would also be more convincing if results could be
obtained for a larger number of synthetic and real
domains. Q2: Please summarize your review in 1-2
sentences
The paper considers and interesting problem:
estimating the parameters for a Markov chain from observed stationary
frequencies of a subset of states. Unfortunately, the proposed estimator,
based on a regularized loss minimization, is heuristic and needs a
stronger justification. The experimental evaluation is interesting but
brief. Submitted by
Assigned_Reviewer_5
Q1: Comments to author(s).
First provide a summary of the paper, and then address the following
criteria: Quality, clarity, originality and significance. (For detailed
reviewing guidelines, see
http://nips.cc/PaperInformation/ReviewerInstructions)
This paper addresses a novel and interesting problem
in Markov chain - finding transition probabilities from partial
observations with a limited number of states observable. The problem is
well motivated and formulated. A gradient based optimization method is
proposed to address the problem.
The experiment section could have
been improved. It would be interesting to see more analysis of the
experiment results; for example a quantitative evaluation of Fig.2(B) if
it is possible. Q2: Please summarize your review in 1-2
sentences
This paper proposes to addresses a new and interesting
problem. The paper is well written. Submitted by
Assigned_Reviewer_6
Q1: Comments to author(s).
First provide a summary of the paper, and then address the following
criteria: Quality, clarity, originality and significance. (For detailed
reviewing guidelines, see
http://nips.cc/PaperInformation/ReviewerInstructions)
This paper addresses the problem of learning the
parameters of a Markov model (not necessarily as a table, but possibly
parameterized in some arbitrary way), based upon partial observations of
state visitation frequencies (some states may not be observed at all) and
pairwise observations of states that are visited sequentially.
The
authors describe the problem well, making a reasonable case that version
of the problem is different from others yet well motivated by practical
concerns (I may be unaware of some work in this area, so I need to take
the authors' word for this), and propose a reasonable solution to the
problem. The paper is generally clear and well written, and the results
seem good, though lack of ground truth data for the real world problem
make this a little difficult to assess fully.
I expect some
Bayesians might complain that (fill in your favorite Bayesian dogma here),
but I can live with this.
Overall, I think this is a solid
contribution.
Minor comments:
I found the use of X_0 and
caligraphic X_0 to be a bit confusing because they look very similar but
mean quite different things.
Line 186: It looks like you have a
forall in superscript.
Line 190: independent c -> independent
of c
The summation in (8) is a bit confusing. How can a state be
larger than another state? I'm assuming that you have imposed an ordering
on the states and that you are abusing notation a bit here to use the
states themselves in places where you might use the indices of the states?
Line 415: percents -> percent
Q2: Please summarize your review in 1-2
sentences
This is a nice approach to a variation on the problem
of inferring the parameters of a Markov model from partial
information.
Q1:Author
rebuttal: Please respond to any concerns raised in the reviews. There are
no constraints on how you want to argue your case, except for the fact
that your text should be limited to a maximum of 6000 characters. Note
however that reviewers and area chairs are very busy and may not read long
vague rebuttals. It is in your own interest to be concise and to the
point.
We very much appreciate the careful reviews, valuable
comments, and also the suggestions for our paper. Our paper will be much
better with revisions based on these comments.
We will focus on
the most critical misunderstanding of the first reviewer
(Assigned_Reviewer_4) about regularization and identifiability. This
reviewer states “[the paper] does not observe that the problem is ill
posed, hence that *zero* loss is obtainable for an uncountable set of
transition models. ... a traffic network possesses clear prior structure:
the transition probability must be zero between unconnected links. There
will be some subsets of stationary state probabilities that can identify
the transition model in such a structured setting, but this possibility is
not even discussed.”
Although it is not explicitly stated in the
original manuscript, we generally assume that a network structure is known
or roughly estimated. As the reviewer suggests, network structures or
connectivity will be available in many applications such as road networks.
So, we does use this prior knowledge of the network connectivity structure
in the transition model of Eq. (18). In Eq. (18), transition probabilities
between unconnected links are defined as zero. We believe that, when an
average degree of a graph is (much) less than the number of states, the
problem is not strongly ill-posed and then we can achieve non-vacuous
results by the use of our regularization, as is presented in the
experimental results. The theoretical analysis of this identifiability is
our important future work.
We would like to carefully add those
explanations and discussions in the final version.
|