det (I - βP) = (1 - 0, 5β) (1 - β2) = (1 - β)(1 + β) (1 - 0,5β)
(I - βP)-1
ɪ β „ 0 1-β 1-β2 0 β 1 0 1-β2 1-β2 0 0,5β2 0,5β2 1 |
1 Γ 0, 5 0, 5 0 ' 1 - β 0, 5 0, 5 0 1 Γ 0 0 0 ■ +-------- 0 0 0 1 - 0,5β _2 _ 1 1 - 3 3 1 - |
(1-β2 )(1-0,5β) (1-β2)(1-0,5β) 1-0,5β 1 Γ 0,5 -0,5 0 ' + —- -0,5 0,5 0 1+β 1 -1 0 66 |
Hence
ν∞ (β) = (1 - β ∙ P) 1 q =C-j----X [∙] + 1 I [ [∙ ∙ ∙] + ^j----+~ξ+ [∙ ∙ ∙]
y 1 - β 1 + β 1 - 0, 5β
-3
5, 5
5, 5
5, 5
2, 5
-2, 5
5
6
ι 1
+ 1 - 0, 5β
0
0
28
3
Now we can calculate total finite expected rewards for given values β, β1 = 0, 5
and β2 = 0, 99. For β1 = 0, 5 we obtain
ν∞ (0, 5) =
1 - 0, 5
5, 5
5, 5
5, 5
ι 1
+ 1 + 0, 5
2, 5
-2,5
5
6
ι 1
+ 1 - 0,5 ∙ 0, 5
0
0
28
3 -
and next
ν1,∞ (0, 5) = 2 ∙ 5, 5 + 0, 666 ∙ 2, 5 + 1, 333 ∙ 0 = 11 + 1, 666 + 0 = 12, 666,
ν2,∞ (0, 5) = 2 ∙ 5, 5 - 0, 666 + 0 = 9, 334,
11