AN ANALYTICAL METHOD TO CALCULATE THE ERGODIC AND DIFFERENCE MATRICES OF THE DISCOUNTED MARKOV DECISION PROCESSES



discount factor β 1 . This factor allows to calculate finite expected rewards which
appear during different economic - technical processes in long period of time (the-
oretically without time limit). This will happen in case if mathematical model
of these real processes is MDP.

Stochastic matrix of transitions P generates for irreducible Markov Chain with fi-
nite set of states very important matrices which can be used for analysis of these
processes: ergodic, fundamental and potential. Very simple and at the same time
strictly mathematical methods of calculation of these matrices in work [14, 18, 42]
are widely discussed. Presented methods are in most cases numerical iterative al-
gorithms which rely on Howard’s algorithms. It exists well - known analytical
method which calculates ergodic matrix for t → ∞. But it is impossible to analy-
se the complete process with disturbances of the transient processs which happen
during initial development period of Markov Decision Process with or without
discount.

The goal of this paper is to present analytical method of calculation of ergodic
matrix and so called difference matrices of Discounted Finite States Markov De-
cision Chain.

It allows to analyse the total process in t [0, ∞) range through separation of
two parts: constant which is represented by ergodic matrix and variable which is
represented by difference matrices.

The paper is organized as follows: In section 2 reminded readers derivation
of general formula for total expected rewards for t → ∞ , given matrix P
and β
[0, 1). The derivation relies on known formula for total expected re-
wards if the input state was defined. In section 3 theorem about existing ergodic
matrix and connected with it difference matrices is formulated and proved. These
matrices always exist for β < 1. In section 4 two simple examples which illustra-
te computational method are solved. In section 5 new performance index used
for optimization Discounted Markov Decision Processes is interpreted.

2 Total expected reward with discount

We consider ergodic Marcov Chain with finite set of states N and given sto-
chastic matrix
P = [pij], i,j = 1,N. We have also one - step matrix of re-
wards R =
[rij], i,j = 1,N which is controlled by Markov Chain. Let νi (n),
i,j = 1, N, n = 0,1,2,... mean total process reward, if the input state was i-
th state. The system will be closed after n steps (transition). Then we can show

1 Discounted factor β < 1 means that value of reward unit which was achieved in moment
t = k, in moment t = k + n has value β
n.



More intriguing information

1. Spatial agglomeration and business groups: new evidence from Italian industrial districts
2. EMU's Decentralized System of Fiscal Policy
3. The name is absent
4. Opciones de política económica en el Perú 2011-2015
5. The name is absent
6. The name is absent
7. Can we design a market for competitive health insurance? CHERE Discussion Paper No 53
8. Ein pragmatisierter Kalkul des naturlichen Schlieβens nebst Metatheorie
9. The name is absent
10. A Rare Presentation of Crohn's Disease
11. Wirkt eine Preisregulierung nur auf den Preis?: Anmerkungen zu den Wirkungen einer Preisregulierung auf das Werbevolumen
12. Estimated Open Economy New Keynesian Phillips Curves for the G7
13. The name is absent
14. IMMIGRATION AND AGRICULTURAL LABOR POLICIES
15. ISSUES IN NONMARKET VALUATION AND POLICY APPLICATION: A RETROSPECTIVE GLANCE
16. The name is absent
17. The name is absent
18. Sector Switching: An Unexplored Dimension of Firm Dynamics in Developing Countries
19. The name is absent
20. Modeling industrial location decisions in U.S. counties