AN ANALYTICAL METHOD TO CALCULATE THE ERGODIC AND DIFFERENCE MATRICES OF THE DISCOUNTED MARKOV DECISION PROCESSES

(see [18, 25]) that the following recurrent formula for rewards is correct

νi (n) = ^XPij [rij +νj (n- 1)] . (1)

j=1

Other form of this formula is:

νi (n) = ^X pij rij +^Xpijνj (n - 1) , (2)

j=1 j

where

qi = ^X pijrij , (3)

j=1

as one-step reward of process.
Now we can write

N
νi (n) = qi + pijνj (n - 1) . (4)

j=1

After taking discount factor β into consideration we receive:

N
νi(n,β) = qi + β pijνj (n - 1) . (5)

j=1

Let us write this formulae as vector

ν (n, β) = q + β ∙ P ∙ ν (n — 1) , n = 0,1,2,... (6)

It is easy to notice that

ν(1,β) = q + βPν(0)

ν(2,β) = q+ βPν(1) = q+ βP(q + βPν(0)) = q+ βPq+ β²P²ν(0)

(7)

(8)

ν (n, β) = q + βⁿPⁿν(0) + ^P_nⁿ^-₌₁¹ βⁿP ⁿq

Taking into consideration fact, that

q ≡ β⁰P⁰q