5 Conclusions and comments
Identical results would be obtained for ν∞ (β) if we use directly formula (10)
(passing over difficulties connected with inverse matrix (I - βP)) . Using formu-
la (19) we can separate two components of total reward; a constant component
connected with 1/ (1 - β) factor and ergodic matrix and variable component
which represents this part of ν∞ (β) which arises under the influence of unste-
ady transient process. Effect of this process is especially visible during the initial
phase of Markov Decision Process. Value of this part of component of quantity
ν∞ (β) rises along with decrease of discounted factor β and increase of disturban-
ces which are generated by matrix P . Two presented examples show it.
We can create, relying on ab ove observations, a performance index of tested Di-
scounted Markov Decision Processes. Let ν∞1 (β) mean component which stand
in front of (1-β), and ν∞ (β), k = 2, 3,..., N mean components which stand in
front of 1-01κβ. Then performance index of mentioned Markov Chain can have
the following form:
J(β)=[Ji(β)]=
Pκ=2,3,... ν∞,i (β)
ν∞,i (β)
(20)
From definition of coefficient J (β) for given β results that when absolute value
of this coefficient is more close zero, then better properties have tested Markov
Process. For presented two examples, values of coefficients amount to:
J (0, 5) =
J (0, 99) =
0, 151
-0, 151
-1, 005
0, 002
-0, 002
-1, 032
J (0, 5) =
J (0, 99)
2, 630
-2, 104
0, 05
-0, 04
We observe that given stochastic matrix P always generates the same transient
process for n = 0, 1, 2 (it means Pn). It results from calculation that effect of the
process depends on value of β. Hence optimisation of Discounted Markov Deci-
sion Process can rely on selection of adequately large factor β for given quality
coefficient. But usually β is given and depends on different economic-technical
conditions. Then optimisation MDP can rely on selection of adequately matrix
13