AN ANALYTICAL METHOD TO CALCULATE THE ERGODIC AND DIFFERENCE MATRICES OF THE DISCOUNTED MARKOV DECISION PROCESSES



(see [18, 25]) that the following recurrent formula for rewards is correct

N

νi (n) = XPij [rij +νj (n- 1)] .                        (1)

j=1

Other form of this formula is:

NN

νi (n) = X pij rij +Xpijνj (n - 1) ,                   (2)

j=1          j

where

N

qi = X pijrij ,                                  (3)

j=1

as one-step reward of process.
Now we can write

N
νi (n) = qi +    pijνj (n - 1) .                           (4)

j=1

After taking discount factor β into consideration we receive:

N
νi(n,β) = qi + β    pijνj (n - 1) .                       (5)

j=1

Let us write this formulae as vector

ν (n, β) = q + β P ∙ ν (n — 1) , n = 0,1,2,...                 (6)

It is easy to notice that

ν(1) = q + βPν(0)

ν(2) = q+ βPν(1) = q+ βP(q + βPν(0)) = q+ βPq+ β2P2ν(0)

(7)


(8)


ν (n, β) = q + βnPnν(0) + Pnn-=11 βnP nq

Taking into consideration fact, that

q β0P0q



More intriguing information

1. LABOR POLICY AND THE OVER-ALL ECONOMY
2. DEVELOPING COLLABORATION IN RURAL POLICY: LESSONS FROM A STATE RURAL DEVELOPMENT COUNCIL
3. The name is absent
4. Can a Robot Hear Music? Can a Robot Dance? Can a Robot Tell What it Knows or Intends to Do? Can it Feel Pride or Shame in Company?
5. The name is absent
6. Endogenous Heterogeneity in Strategic Models: Symmetry-breaking via Strategic Substitutes and Nonconcavities
7. Labour Market Institutions and the Personal Distribution of Income in the OECD
8. The name is absent
9. The name is absent
10. The name is absent