where E denotes the expectation operator. Finding w* is precisely the problem of
determining the parameters of an optimal least-squares approximator to E (y | x), the
conditional expectation of y given x. The expectation defining the optimand is
unknown, so that this problem has to be solved statistically.
Backpropagation of Gradient Descent Errors
Backpropagation of gradient descent errors is such a method that allows parameters to
be learned from experience in a process which resembles trial and error (see Rumelhart,
Hinton and Williams 1986). Experience is based on empirical observations on the
phenomenon of spatial interaction of interest. Thus, we assume to have a training set
available consisting of observations xu (u = 1,...,U) on the input variables together
with observations yu (u = 1,...,U) on corresponding target variables, the network
model is to learn to associate with xu . According to the backpropagation of the gradient
descent errors procedure one starts with a set of random weights, say w0 , and then
updates them by the following formula for the n-th step:
w (n +1) = w (n) + η VΩl (xu+1, w (n)) (yu+1 - Ωl (xu+1, w (n))j (9)
where η is a learning rate and VΩl denotes the gradient of Ωl with respect to w. The
weights are adjusted in response to errors in hitting the target where errors are measured
in terms of the least square error function. This error is propagated back. Although
many modifications of and alternatives to this parameter estimation approach have been
suggested in the neural network literature over the past years, experience shows that
surprisingly good network model performance can often be achieved with the epoch-
based stochastic version of this learning approach (see Fischer and Gopal 1994).
2.3 Departure from the Classical Neural Network Approach
Although classical neural spatial interaction models of type (7) represent a rich and
flexible family of spatial interaction function approximators for real world applications,