of the model. Small networks learn slowly or may not learn at all to an acceptable error
level. Larger networks usually learn better, but such sizes may lead to generalisation
degradation which is known as overtraining or overfitting. The correct number of
hidden units is problem dependent and a function of the complexity of the input-output
mapping to be realised by the model and the required accuracy. The common goal is to
simplify the model in terms of H without sacrificing its generalisation performance.
Various techniques have been developed in the neural network literature to control the
effective complexity of neural network models, in most cases as part of the network
training process itself. Since the maximum complexity of our model can be controlled
by limiting the number of hidden units, one obvious approach to the bias-variance
trade-off is to train several model candidates with different numbers of hidden product
units, and to select that model which gives the best generalisation performance. An
obvious drawback of such an approach is its trial and error nature.
Another and more principled approach to the problem, that has been utilised by Fischer
and Gopal (1994), is the procedure of stopped or cross-validated training. Here, an
overparametrised model (larger H) is trained until the error on further independent data,
called validation set, deteriorates, then training is stopped. This contrasts to the above
approach since the choice of H does not require convergence of the training process.
The training process is used to perform a directed search of the parameter space for a
model that does not overfit the data and, thus, demonstrates generalisation performance.
But this approach has its shortcomings too. First it might be hard in practice to identify
when to stop training. Second, the results may depend on the specific training set-
validation set pair chosen. Third, the model which has the best performance on the
validation set might not be the one with the best performance on the test set.
The second issue involves network training or learning [i.e. parameter estimation]. This
issue will be addressed in the next section in some more detail.
15