(see [18, 25]) that the following recurrent formula for rewards is correct
N
νi (n) = XPij [rij +νj (n- 1)] . (1)
j=1
Other form of this formula is:
NN
νi (n) = X pij rij +Xpijνj (n - 1) , (2)
j=1 j
where
N
qi = X pijrij , (3)
j=1
as one-step reward of process.
Now we can write
N
νi (n) = qi + pijνj (n - 1) . (4)
j=1
After taking discount factor β into consideration we receive:
N
νi(n,β) = qi + β pijνj (n - 1) . (5)
j=1
Let us write this formulae as vector
ν (n, β) = q + β ∙ P ∙ ν (n — 1) , n = 0,1,2,... (6)
It is easy to notice that
ν(1,β) = q + βPν(0)
ν(2,β) = q+ βPν(1) = q+ βP(q + βPν(0)) = q+ βPq+ β2P2ν(0)
(7)
(8)
ν (n, β) = q + βnPnν(0) + Pnn-=11 βnP nq
Taking into consideration fact, that
q ≡ β0P0q