AN ANALYTICAL METHOD TO CALCULATE THE ERGODIC AND DIFFERENCE MATRICES OF THE DISCOUNTED MARKOV DECISION PROCESSES



(see [18, 25]) that the following recurrent formula for rewards is correct

N

νi (n) = XPij [rij +νj (n- 1)] .                        (1)

j=1

Other form of this formula is:

NN

νi (n) = X pij rij +Xpijνj (n - 1) ,                   (2)

j=1          j

where

N

qi = X pijrij ,                                  (3)

j=1

as one-step reward of process.
Now we can write

N
νi (n) = qi +    pijνj (n - 1) .                           (4)

j=1

After taking discount factor β into consideration we receive:

N
νi(n,β) = qi + β    pijνj (n - 1) .                       (5)

j=1

Let us write this formulae as vector

ν (n, β) = q + β P ∙ ν (n — 1) , n = 0,1,2,...                 (6)

It is easy to notice that

ν(1) = q + βPν(0)

ν(2) = q+ βPν(1) = q+ βP(q + βPν(0)) = q+ βPq+ β2P2ν(0)

(7)


(8)


ν (n, β) = q + βnPnν(0) + Pnn-=11 βnP nq

Taking into consideration fact, that

q β0P0q



More intriguing information

1. The name is absent
2. Skills, Partnerships and Tenancy in Sri Lankan Rice Farms
3. Behaviour-based Knowledge Systems: An Epigenetic Path from Behaviour to Knowledge
4. The name is absent
5. Road pricing and (re)location decisions households
6. ANTI-COMPETITIVE FINANCIAL CONTRACTING: THE DESIGN OF FINANCIAL CLAIMS.
7. A Note on Productivity Change in European Co-operative Banks: The Luenberger Indicator Approach
8. A NEW PERSPECTIVE ON UNDERINVESTMENT IN AGRICULTURAL R&D
9. The name is absent
10. The name is absent