I have been contemplating the reason why learning the weights of ER inference rules separately or jointly, essentially leads to the same weights given the same ground data (evidence and ground truth) as shown in the slides today. To understand the reason behind this I had to dig back to my notes when I was still learning about Markov Networks back in February 2014. While stratification might play a part in this, the actual reason is that the query predicates in my rules are independent from each other given the evidence predicates in the ground Markov Networks. As you recall the ER inference rules has the following form:

Evid1(X) ^ Evid2(X) ^ Evid3(X) => QUERY1(X)

Evid1(X) ^ !Evid2(X) ^ Evid4(X) => QUERY2(X)

Given a domain of values for X, in the ground Markov Network paths from nodes of QUERY1(X) are always conditionally independent from nodes of QUERY2(X) given the nodes of Evid(X), Evid2(X), Evid3(X), and Evid4(X). As a result of this, the ground atoms of QUERY1 and QUERY2 never appear together in the same potential function in the factorized probability density function. Thus the optimization process always optimizes the weights for such rules independently from each other. And as a consequence of this weights are always the same whether they learned together or individually.