where Vd denotes the expected lifetime utility at the start of a period, before pairing
takes place, to a player who selects Z currently, given that d =1,L, 4 players choose Z
currently.26 Using the vector of probabilities ρ and the transition matrix A, where we
denote ad the d th row of matrix A, we have:
Vd =z+ρd(h-z)+δAdV
That is, the expected current utility depends on the probability of encountering a
cooperator. When he meets a cooperator the player earns h , and otherwise he earns z .
The continuation payoff is 0 with probability 1-δ, and it is AdV with probability δ . The
latter component tells us that current play may lead to different numbers of cooperators
tomorrow, depending on the outcome of the pairing process. Specifically, we have:
V1 =h+δV2
2 . Ji 1 2 ^
V2 = z+ τ(h - z) + δl 7V2 + TV1 I
3 33 3 > (1)
V3 = z+3(h - z) + δV4
V4 =z+δV4
To see how we derive them, we discuss the first two lines. Consider the first line. If a
player is the initial deviator, then he is certainly paired to a cooperator, i.e., ρ1 =1 and
earns current payoff h. The current cooperator will choose Z in the future. Thus, the
current deviator’s continuation payoff is δV2 . Consider the second line. Since the player
chooses Z currently, he earns z if he meets the other only player who chooses Z (with
probability 1) and earns h if he meets a cooperator (with probability 2). This gives
3 3
expected current utility z+2(h - z). The continuation payoff depends on which one of
these pairs took place. If he met the other deviator, no cooperator observes Z today, so
tomorrow there will still be two players who select Z. Otherwise, both cooperators
Clearly, the agent selects Z as a deviation from equilibrium when d = 1. In this case the agent is the
initial deviator. If d = 2, instead, the agent may select Z simply because he observed Z in the past and now
follows the sanctioning rule.
36