AN ANALYTICAL METHOD TO CALCULATE THE ERGODIC AND DIFFERENCE MATRICES OF THE DISCOUNTED MARKOV DECISION PROCESSES



discount factor β 1 . This factor allows to calculate finite expected rewards which
appear during different economic - technical processes in long period of time (the-
oretically without time limit). This will happen in case if mathematical model
of these real processes is MDP.

Stochastic matrix of transitions P generates for irreducible Markov Chain with fi-
nite set of states very important matrices which can be used for analysis of these
processes: ergodic, fundamental and potential. Very simple and at the same time
strictly mathematical methods of calculation of these matrices in work [14, 18, 42]
are widely discussed. Presented methods are in most cases numerical iterative al-
gorithms which rely on Howard’s algorithms. It exists well - known analytical
method which calculates ergodic matrix for t → ∞. But it is impossible to analy-
se the complete process with disturbances of the transient processs which happen
during initial development period of Markov Decision Process with or without
discount.

The goal of this paper is to present analytical method of calculation of ergodic
matrix and so called difference matrices of Discounted Finite States Markov De-
cision Chain.

It allows to analyse the total process in t [0, ∞) range through separation of
two parts: constant which is represented by ergodic matrix and variable which is
represented by difference matrices.

The paper is organized as follows: In section 2 reminded readers derivation
of general formula for total expected rewards for t → ∞ , given matrix P
and β
[0, 1). The derivation relies on known formula for total expected re-
wards if the input state was defined. In section 3 theorem about existing ergodic
matrix and connected with it difference matrices is formulated and proved. These
matrices always exist for β < 1. In section 4 two simple examples which illustra-
te computational method are solved. In section 5 new performance index used
for optimization Discounted Markov Decision Processes is interpreted.

2 Total expected reward with discount

We consider ergodic Marcov Chain with finite set of states N and given sto-
chastic matrix
P = [pij], i,j = 1,N. We have also one - step matrix of re-
wards R =
[rij], i,j = 1,N which is controlled by Markov Chain. Let νi (n),
i,j = 1, N, n = 0,1,2,... mean total process reward, if the input state was i-
th state. The system will be closed after n steps (transition). Then we can show

1 Discounted factor β < 1 means that value of reward unit which was achieved in moment
t = k, in moment t = k + n has value β
n.



More intriguing information

1. Tobacco and Alcohol: Complements or Substitutes? - A Statistical Guinea Pig Approach
2. fMRI Investigation of Cortical and Subcortical Networks in the Learning of Abstract and Effector-Specific Representations of Motor Sequences
3. The name is absent
4. The name is absent
5. Does Competition Increase Economic Efficiency in Swedish County Councils?
6. Auctions in an outcome-based payment scheme to reward ecological services in agriculture – Conception, implementation and results
7. Cultural Diversity and Human Rights: a propos of a minority educational reform
8. The name is absent
9. The name is absent
10. Deletion of a mycobacterial gene encoding a reductase leads to an altered cell wall containing β-oxo-mycolic acid analogues, and the accumulation of long-chain ketones related to mycolic acids