4 Training the Modular Product Unit Network Model
4.1 The Optimisation Problem
Having identified the model structure for singly constrained spatial interaction
prediction in the previous section, we can now follow section 2.2 to view network
training in an optimisation context and proceed by considering the parameter estimation
problem as least squares learning. The goal of least squares learning is to find w* so
that the least square error function, say Q, is minimized5:
min Q ( xu 1, yu 1, w ) ~ min ∑ ( yu 1 - 1 ⅛l ( xu 1, w ))
(26)
ww
( xu 1, yu 1 ) Ш1
where M1 denotes the training set available consisting of observations, say
xu 1 (u = 1,..., U1 ) on the input variables together with observations, say
yu 1 (u = 1,..., U1 ), on corresponding target variables, the network model is to learn to
associate with xu1 .
Q (xu 1, yu 1, w) is non-negative, continuously differentiable on the (3^)-dimensional
parameter space which is a finite dimensional closed bounded domain and, thus,
compact. The compactness of the parameter space is of great theoretical convenience. It
can be shown that Q (xu 1, yu 1, w) assumes its value w* as the weight minimum under
certain conditions. But characteristically there exist many minima in real world
applications all of which satisfy
V Q ( xu 1, yu 1, w* ) = 0
where NQ denotes the gradient of Q. The minimum for which the value of Q is
smallest is termed the global minimum while other minima are called local minima.
Unfortunately there is no guarantee about which kind of minimum is encountered.
(27)
The fraction of the parameter space that contains a solution depends on the capacity of
the network model [in an information theoretic sense] and the complexity of the
16