J(τ, х; и) =
(4)
g(X(t),u(t),t)dt + s(X(T))
∣X(τ) = х
is the profit-to-go function. For Markovian feed-
back controls the maximum value of (4)
H(τ,x) = max J(τ, x; μ(∙))
μ(∙)
satisfies the HJB equation
max{ρ(x, u,t) + LuH(t,x)} = 0 (5)
with the boundary condition
H(T, X(T)) = s(X (T)) (6)
where Lu is the operator
Q n .ɔ
£“ = ^ + §/i(æ,w’Tfe
I A π , λ ∂2
+ - > Bi j(x,u,τ)-—-— (7)
2i~'1 ∂xi∂xj v 7
and where /, is the г-th component function of
/ and Bij is the i,J-th entry of the covariance
matrix B = bbτ.
2.2 A Corresponding Markov Decision Chain
A Markov decision chain corresponding to the
optimisation problem max(4) subject to (1) and
(2) is obtained through the following three steps.
First, the state equation (1) is discretised in
time using the Euler-Maruyama approximation
(see [5]). Then, the state space is restricted to a
finite dimensional discrete state grid and, finally,
the transition probabilities and rewards for these
discrete states are specified.
Euler-Maruyama Approximation. An Euler-
Maruyama approximation of process4 X C IR1
that satisfies equation (1) is a stochastic process
γ = {Yi ex, o<i<N}
satisfying the equation (called the iterative scheme)
Yf+ι = Yi + f (Yi, ui, τi) (τ⅛+ι - τi) +
b(Yi, ui, τi) (W(τi+1) - W(τi)) (8)
τ = ! o }<-o∙ with 7^o = O and Tjv = T is a strictly
increasing sequence of real numbers that partition
the time interval [0,T].
The indices run t = 0,1,2,..., N — 1, the initial
and subsequent values are, respectively
У о = X(O) = x0, Yi = Y(τi). (9)
For a time discretisation using a constant time
step (where N is a positive integral number)
„ c 1 T f λ
τi = t δ where δ = τi+-i — τi = —. (10)
Notation. The discretisation scheme, while intu-
itively simple, overlays several layers of discretisa-
tion: of time, of state, and of noise. We adopt the
following conventions.
(1) Continuous-time variables: x(t) (standard);
variables in discrete time: xi.
(2) Points of the discrete state space (“grid”)
x ∈ Xi.
(3) Stochastic processes: x (bold).
Discrete State Space. Equidistant grids will be
used for simplicity (see [8]). The discrete state
space for stage t is denoted by Xi C IR1. Let
the upper and lower bounds of the state grid be
Ui = max Xi and Li = min Xi.
respectively. A point x ∈ X is defined to be within
the grid Xi if Li ≤ x ≤ Ui. The collection of the
discrete state spaces for all the stages, {A√})Σ0,
is denoted X and called the discrete state space.
Adjacency. Heuristically, the scheme approxi-
mates a point of X at stage I by the points of
Xi which are “adjacent” to it.
(1) Two states of Xi are adjacent if no other
state of Xi lies between them 5 .
(2) Given a point of the continuous state space,
x e X, a pair of states, xθ ∈ Xi and
x® ∈ Xi, is adjacent to x if the states are
adjacent and xθ < x < xφ.
(3) Given x ∈ X with x > Ui define Ui to be
adjacent to x.
(4) Given x ∈ X with x < Li define Li to be
adjacent to x.
(5) Given x ∈ X with x ∈ Xi define x to be
adjacent to itself.
4 The approximation scheme is introduced for a one di-
mensional process. The extension of the scheme to IRn is
obvious.
5 In IRn two states are adjacent if their projections onto
each of the n coordinate axes are adjacent in the sense just
defined.