(see [18, 25]) that the following recurrent formula for rewards is correct
N
νi (n) = XPij [rij +νj (n- 1)] . (1)
j=1
Other form of this formula is:
NN
νi (n) = X pij rij +Xpijνj (n - 1) , (2)
j=1 j
where
N
qi = X pijrij , (3)
j=1
as one-step reward of process.
Now we can write
N
νi (n) = qi + pijνj (n - 1) . (4)
j=1
After taking discount factor β into consideration we receive:
N
νi(n,β) = qi + β pijνj (n - 1) . (5)
j=1
Let us write this formulae as vector
ν (n, β) = q + β ∙ P ∙ ν (n — 1) , n = 0,1,2,... (6)
It is easy to notice that
ν(1,β) = q + βPν(0)
ν(2,β) = q+ βPν(1) = q+ βP(q + βPν(0)) = q+ βPq+ β2P2ν(0)
(7)
(8)
ν (n, β) = q + βnPnν(0) + Pnn-=11 βnP nq
Taking into consideration fact, that
q ≡ β0P0q
More intriguing information
1. Cardiac Arrhythmia and Geomagnetic Activity2. The Value of Cultural Heritage Sites in Armenia: Evidence From a Travel Cost Method Study
3. The Provisions on Geographical Indications in the TRIPS Agreement
4. Qualifying Recital: Lisa Carol Hardaway, flute
5. Who’s afraid of critical race theory in education? a reply to Mike Cole’s ‘The color-line and the class struggle’
6. LABOR POLICY AND THE OVER-ALL ECONOMY
7. Linkages between research, scholarship and teaching in universities in China
8. Quality Enhancement for E-Learning Courses: The Role of Student Feedback
9. Job quality and labour market performance
10. The name is absent