P , it means, the control strategy should minimize quality coefficient if the factor
β is given.
These aspects mentioned above will be a subject area of next papers. It results
from analyses of MDP some conclusions:
1. Formula (19) allows to calculate in analytical way value of total reward
ν∞k (β) without difficult process of reverse of the matrix (I - βP). Reverse
of matrices using computer technology goes on in iterative way. The number
of iteration rises along with the size N of matrix rapidly. It leads to loss of
calculation’s accuracy.
2. Proposed analytical method of calculation of ergodic and difference matri-
ces gives us the possibility of selection of two components of total reward.
It increases the possibility of analysis of Discounted Markov Decision Pro-
cess.
14