Reference
[0] S.C. Zhu and X.W. Liu, "Learning in Gibbsian Fields: How Fast and How Accurate Can It Be?", IEEE Trans on PAMI vol.24, no.7, pp. 10011006, July, 2002. [1] M. P. Almeida, B. Gidas, ``A variational method for estimating the parameters of MRF from complete and incomplete data'', The Annals of Applied Statistics, 3:103136, 1993. [2] J. Besag, ``Efficiency of pseudolikelihood estimation for simple Gaussian fields", Biometrika, 64: 616618, 1977. [3] R. Chellappa and A. Jain, Markov random fields: theory and applications, Academic Press, 1993. [4] X. Descombes, R. Morris, J. Zerubia, and M. Berthod, ``Maximum likelihood estimation of Markov random field parameters using Markov chain Monte Carlo algorithms'', Int'l Workshop on Energy Minimization Methods for CVPR}, Venice, Italy, May, 1997. [5] G. R. Cross and A. K. Jain, ``Markov random field texture models.'' IEEE, PAMI, 5, 2539. 1983. [6] S. Geman and D. McClure, "Bayesian images analysis: An application to single photon emission tomography," Proc. Statist. Comput. Sect. Amer. Stat. Assoc., pp.1218, Washington. DC, 1985. [7] C. J. Geyer, ``On the convergence of Monte Carlo maximum likelihood calculations'', J. of Royal Stat. Soc. B, vol. 56, pp. 261274, 1994.
[8] M. Jerrum and A. Sinclair, ``Polynomialtime approximation algorithms for the Ising model'', SIAM Journal on Computing, vol. 22, pp. 10871116, 1993. [9] G. G. Potamianos and J. Goutsias, ``Stochastic Approximation Algorithms for Partition Function Estimation of Gibbs Random Fields'', IEEE Trans on Info. Theory, vol. 43, 19481965,1997. [10] J. Shah, ``Minimax entropy and learning by diffusion'', Proc. of CVPR, Santa Barbara, CA. 1998. [11] L. Younes, ``Estimation and annealing for Gibbsian fields'', Annales de l'Institut Henri Poincare, Section B, Calcul des Probabilities et Statistique, 24, 269294, 1988. [12] S. C. Zhu, Y. Wu, and D. Mumford. ``Minimax Entropy Principle and Its Application to Texture Modeling''. Neural Computation, Vol. 9, no 8, Nov. 1997.
Fig2.a Stochastic gradient methods estimate by computing a set of inference models (or satellite) online  Fig2.b Pseudolikelihood method uses unform distribution as single reference.  Fig2.c maximum satellite likelihood methods make use of a set of precomputed reference models for Monte Carlo integration. 
Fig.1.a, likelihood

Fig.1.b, pseudolikelihood

Fig.1.c, patch likelihood

Fig.1.d, partial likelihood

Fig3 
This is a long, long history for learning Gibbs models in the literature. We list some papers below. Of course, mostly people often have to give in to the computational complexity and thus choose Gaussian MRF models or pseudolikelihood models, and most of the literature are focused on Gibbs models with pairclique potentials. The FRAME model introduced large neighborhood and thus made the computation much worse. There is no free lunch in estimating the Gibbs models. But we can still do smart things but carefully analyzing the problem in hand.
In a recent study, we analyse four types of likelihood models, and thus try to strike a good point for choosing the right likelihood formulation. For details, see the reference 1 below.