992 samples into three subsets12: the training [in-sample] set
M1 = Kxu 1,yu 1 j withu1 =1,...,U1 = 496patterns}, the internal validation set
M2 = {(xu2,yu2 ) with u2 = 1,...,U2 = 248 patterns} and the testing [prediction, out-of-
sample] set M3 = ∣(xu3,yu3) with u3 = 1,...,U3 = 248 patterns}. M1 is used only for
parameter estimation, while M2 for validation. The generalisation performance of the
model is assessed on the testing set M3 .
Though the simplicity of this method is appealing, an obvious concern is the necessary
reduction in the amount of training data. In deciding how to partition the data, a
compromise has been made between creating a test set large enough to fully test the
fitted model while still retaining a sufficient amount of training and internal validation
data. If the test set is too small then the variance of the prediction error estimate will be
high due to the small sample size. Though random splits are commonly used and appear
to work reasonably well in the case of unconstrained spatial interaction, a more
systematic splitting method had to be used in the case of constrained spatial interaction.
Table 1: Descriptive Statistics: The Training, Validation and Testing Sets
Variables |
Mean |
Standard Deviation |
Minimum |
Maximum |
Whole Set M | ||||
sj |
26,364,563 |
50,350,660 |
2,310,400 |
285,193,984 |
dij |
229.4 |
124.6 |
30.0 |
630.0 |
t |
8.6 |
22.6 |
0.0 |
257.9 |
ti .________________________________ |
266.0 |
350.1 |
41.9 |
1830.1 |
Training Set M1 | ||||
sj |
26,142,923 |
49,711,907 |
2,310,400 |
285,193,984 |
dij |
234.1 |
129.6 |
35.0 |
630.0 |
tij |
9.6 |
26.2 |
0.0 |
257.9 |
ti.________________________________ |
297.0 |
429.1 |
41.9 |
1830.1 |
Validation Set M2 | ||||
Sj |
26,517,946 |
50,891,071 |
2,310,400 |
285,193,984 |
d |
219.3 |
121.4 |
30.0 |
590.0 |
t tij |
7.1 |
16.6 |
0.0 |
166.8 |
ti. |
220.9 |
221.4 |
45.6 |
759.8 |
Testing Set M3 | ||||
Sj |
26,654,459 |
51,069,577 |
2,310,400 |
285,193,984 |
Dij |
230.3 |
116.7 |
37.0 |
627.0 |
t |
8.0 |
19.7 |
0.0 |
195.2 |
ti. |
249.0 |
262.5 |
55.3 |
895.7 |
Note: M consists of 992 patterns, M1 of 496 patterns, M2 of 248 patterns and M3 of 248
patterns.
23