AN ANALYTICAL METHOD TO CALCULATE THE ERGODIC AND DIFFERENCE MATRICES OF THE DISCOUNTED MARKOV DECISION PROCESSES

discount factor β ¹ . This factor allows to calculate finite expected rewards which
appear during different economic - technical processes in long period of time (the-
oretically without time limit). This will happen in case if mathematical model
of these real processes is MDP.

Stochastic matrix of transitions P generates for irreducible Markov Chain with fi-
nite set of states very important matrices which can be used for analysis of these
processes: ergodic, fundamental and potential. Very simple and at the same time
strictly mathematical methods of calculation of these matrices in work [14, 18, 42]
are widely discussed. Presented methods are in most cases numerical iterative al-
gorithms which rely on Howard’s algorithms. It exists well - known analytical
method which calculates ergodic matrix for t → ∞. But it is impossible to analy-
se the complete process with disturbances of the transient processs which happen
during initial development period of Markov Decision Process with or without
discount.

The goal of this paper is to present analytical method of calculation of ergodic
matrix and so called difference matrices of Discounted Finite States Markov De-
cision Chain.

It allows to analyse the total process in t ∈ [0, ∞) range through separation of
two parts: constant which is represented by ergodic matrix and variable which is
represented by difference matrices.

The paper is organized as follows: In section 2 reminded readers derivation
of general formula for total expected rewards for t → ∞ , given matrix P
and β ∈ [0, 1). The derivation relies on known formula for total expected re-
wards if the input state was defined. In section 3 theorem about existing ergodic
matrix and connected with it difference matrices is formulated and proved. These
matrices always exist for β < 1. In section 4 two simple examples which illustra-
te computational method are solved. In section 5 new performance index used
for optimization Discounted Markov Decision Processes is interpreted.

2 Total expected reward with discount

We consider ergodic Marcov Chain with finite set of states N and given sto-
chastic matrix P = [p_ij∙], i,j = 1,N. We have also one - step matrix of re-
wards R = [r_ij], i,j = 1,N which is controlled by Markov Chain. Let ν_i (n),
i,j = 1, N, n = 0,1,2,... mean total process reward, if the input state was i-
th state. The system will be closed after n steps (transition). Then we can show

¹ Discounted factor β < 1 means that value of reward unit which was achieved in moment
t = k, in moment t = k + n has value βⁿ.

More intriguing information

1. The use of formal education in Denmark 1980-1992
2. The name is absent
3. An alternative way to model merit good arguments
4. The name is absent
5. The name is absent
6. On Social and Market Sanctions in Deterring non Compliance in Pollution Standards
7. Feeling Good about Giving: The Benefits (and Costs) of Self-Interested Charitable Behavior
8. The name is absent
9. The name is absent
10. Yield curve analysis