AN ANALYTICAL METHOD TO CALCULATE THE ERGODIC AND DIFFERENCE MATRICES OF THE DISCOUNTED MARKOV DECISION PROCESSES



(see [18, 25]) that the following recurrent formula for rewards is correct

N

νi (n) = XPij [rij +νj (n- 1)] .                        (1)

j=1

Other form of this formula is:

NN

νi (n) = X pij rij +Xpijνj (n - 1) ,                   (2)

j=1          j

where

N

qi = X pijrij ,                                  (3)

j=1

as one-step reward of process.
Now we can write

N
νi (n) = qi +    pijνj (n - 1) .                           (4)

j=1

After taking discount factor β into consideration we receive:

N
νi(n,β) = qi + β    pijνj (n - 1) .                       (5)

j=1

Let us write this formulae as vector

ν (n, β) = q + β P ∙ ν (n — 1) , n = 0,1,2,...                 (6)

It is easy to notice that

ν(1) = q + βPν(0)

ν(2) = q+ βPν(1) = q+ βP(q + βPν(0)) = q+ βPq+ β2P2ν(0)

(7)


(8)


ν (n, β) = q + βnPnν(0) + Pnn-=11 βnP nq

Taking into consideration fact, that

q β0P0q



More intriguing information

1. The WTO and the Cartagena Protocol: International Policy Coordination or Conflict?
2. Quelles politiques de développement durable au Mali et à Madagascar ?
3. Magnetic Resonance Imaging in patients with ICDs and Pacemakers
4. Applications of Evolutionary Economic Geography
5. Bridging Micro- and Macro-Analyses of the EU Sugar Program: Methods and Insights
6. The Effects of Attendance on Academic Performance: Panel Data Evidence for Introductory Microeconomics
7. Inflation and Inflation Uncertainty in the Euro Area
8. Macroeconomic Interdependence in a Two-Country DSGE Model under Diverging Interest-Rate Rules
9. The name is absent
10. How to do things without words: Infants, utterance-activity and distributed cognition.