AN ANALYTICAL METHOD TO CALCULATE THE ERGODIC AND DIFFERENCE MATRICES OF THE DISCOUNTED MARKOV DECISION PROCESSES



(see [18, 25]) that the following recurrent formula for rewards is correct

N

νi (n) = XPij [rij +νj (n- 1)] .                        (1)

j=1

Other form of this formula is:

NN

νi (n) = X pij rij +Xpijνj (n - 1) ,                   (2)

j=1          j

where

N

qi = X pijrij ,                                  (3)

j=1

as one-step reward of process.
Now we can write

N
νi (n) = qi +    pijνj (n - 1) .                           (4)

j=1

After taking discount factor β into consideration we receive:

N
νi(n,β) = qi + β    pijνj (n - 1) .                       (5)

j=1

Let us write this formulae as vector

ν (n, β) = q + β P ∙ ν (n — 1) , n = 0,1,2,...                 (6)

It is easy to notice that

ν(1) = q + βPν(0)

ν(2) = q+ βPν(1) = q+ βP(q + βPν(0)) = q+ βPq+ β2P2ν(0)

(7)


(8)


ν (n, β) = q + βnPnν(0) + Pnn-=11 βnP nq

Taking into consideration fact, that

q β0P0q



More intriguing information

1. Structural Conservation Practices in U.S. Corn Production: Evidence on Environmental Stewardship by Program Participants and Non-Participants
2. Delivering job search services in rural labour markets: the role of ICT
3. THE MEXICAN HOG INDUSTRY: MOVING BEYOND 2003
4. Stakeholder Activism, Managerial Entrenchment, and the Congruence of Interests between Shareholders and Stakeholders
5. ‘I’m so much more myself now, coming back to work’ - working class mothers, paid work and childcare.
6. Thresholds for Employment and Unemployment - a Spatial Analysis of German Regional Labour Markets 1992-2000
7. Midwest prospects and the new economy
8. Computational Experiments with the Fuzzy Love and Romance
9. Biologically inspired distributed machine cognition: a new formal approach to hyperparallel computation
10. Models of Cognition: Neurological possibility does not indicate neurological plausibility.