Table 2: The Modular Product Unit Neural Network: Choice of H and δ [T = 1,000;
N = 10]
Parameter |
Iterations |
ARV(M1) |
ARV(M2)______ |
H = 4 δ = 0.0025 |
40.782 ±19,886 |
0.2217 (0.0134) |
0.1490 (0.0135) |
H = 8 δ = 0.0025 |
43,905 ±18,854 |
0.2215 (0.0124) |
0.1490 (0.0099) |
H = 12 δ = 0.0025 |
38,905 ±17,896 |
0.2239 (0.0118) |
0.1475 (0.0074) |
H = 16 δ = 0.0005 |
52,702 ±33,311 |
0.2483 (0.0296) |
0.1663 (0.0295) |
δ = 0.0010 |
59,321 ±49,647 |
0.2368 (0.0244) |
0.1585 (0.0263) |
δ = 0.0025 |
45,754 ±21,284 |
0.2212 (0.0087) |
0.1473 (0.0054) |
δ = 0.0050 |
22,948 ±15,360 |
0.2216 (0.0107) |
0.1512 (0.0090) |
δ = 0.0075 |
17,427 ±12,918 |
0.2206 (0.0115) |
0.1547 (0.0094) |
δ = 0.0100 |
13,545 ±11,753 |
0.2241 (0.0151) |
0.1593 (0.0131) |
H = 24 δ = 0.0025 |
40,580 ±20,047 |
0.2230 (0.0097) |
0.1481 (0.0053) |
ARV-performance values represent the mean (standard deviations in brackets) of 60
simulations differing in the initial parameter values randomly chosen from [-0.3; 0.3].
Iterations: Number of iterations required to reach the parameter vector that provides
the best ARV(M2) performance.
ARV(M1): In-sample performance measured in terms of relative average variances.
ARV(M2): Out-of-sample performance measured in terms of relative average
variances.
M consists of 992 patterns, M1 of 496 patterns, M2 of 248 patterns and M3 of 248
patterns.
Figure 3 shows the learning curves of a typical run of the model (H = 16; measured by
Alopex with T = 1,000; N = 10; δ =0.0025) of the model in terms of ARV( M1 ),
ARV(M2 ) and ARV(M3 ) respectively. The term learning curve is used to characterise
the performance as a function of iterations of the Alopex procedure. Figure 3(a) plots
the ARV-performance on the training set, Figure 3(b) the ARV-performance on the
validation set and Figure 3(c) the ARV-performance on the testing set.
Typically, at the beginning of the training process, the validation error oscillates
rapidly. Later, around 5,000 iterations the training process stabilises and the changes in
the validation error become smaller. Instead of a clear increasing trend in the validation
error that characterises overfitting, it starts around 12,500 to wander around some
constant value. These undulations are caused by an increase of T in order to escape
from shallow, local minima of the error surface (see Figure 3(d)). Later, the training
25