AN ANALYTICAL METHOD TO CALCULATE THE ERGODIC AND DIFFERENCE MATRICES OF THE DISCOUNTED MARKOV DECISION PROCESSES



(see [18, 25]) that the following recurrent formula for rewards is correct

N

νi (n) = XPij [rij +νj (n- 1)] .                        (1)

j=1

Other form of this formula is:

NN

νi (n) = X pij rij +Xpijνj (n - 1) ,                   (2)

j=1          j

where

N

qi = X pijrij ,                                  (3)

j=1

as one-step reward of process.
Now we can write

N
νi (n) = qi +    pijνj (n - 1) .                           (4)

j=1

After taking discount factor β into consideration we receive:

N
νi(n,β) = qi + β    pijνj (n - 1) .                       (5)

j=1

Let us write this formulae as vector

ν (n, β) = q + β P ∙ ν (n — 1) , n = 0,1,2,...                 (6)

It is easy to notice that

ν(1) = q + βPν(0)

ν(2) = q+ βPν(1) = q+ βP(q + βPν(0)) = q+ βPq+ β2P2ν(0)

(7)


(8)


ν (n, β) = q + βnPnν(0) + Pnn-=11 βnP nq

Taking into consideration fact, that

q β0P0q



More intriguing information

1. Cardiac Arrhythmia and Geomagnetic Activity
2. The Value of Cultural Heritage Sites in Armenia: Evidence From a Travel Cost Method Study
3. The Provisions on Geographical Indications in the TRIPS Agreement
4. Qualifying Recital: Lisa Carol Hardaway, flute
5. Who’s afraid of critical race theory in education? a reply to Mike Cole’s ‘The color-line and the class struggle’
6. LABOR POLICY AND THE OVER-ALL ECONOMY
7. Linkages between research, scholarship and teaching in universities in China
8. Quality Enhancement for E-Learning Courses: The Role of Student Feedback
9. Job quality and labour market performance
10. The name is absent