992 samples into three subsets12: the training [in-sample] set
M1 = Kxu 1,yu 1 j withu1 =1,...,U1 = 496patterns}, the internal validation set
M2 = {(xu2,yu2 ) with u2 = 1,...,U2 = 248 patterns} and the testing [prediction, out-of-
sample] set M3 = ∣(xu3,yu3) with u3 = 1,...,U3 = 248 patterns}. M1 is used only for
parameter estimation, while M2 for validation. The generalisation performance of the
model is assessed on the testing set M3 .
Though the simplicity of this method is appealing, an obvious concern is the necessary
reduction in the amount of training data. In deciding how to partition the data, a
compromise has been made between creating a test set large enough to fully test the
fitted model while still retaining a sufficient amount of training and internal validation
data. If the test set is too small then the variance of the prediction error estimate will be
high due to the small sample size. Though random splits are commonly used and appear
to work reasonably well in the case of unconstrained spatial interaction, a more
systematic splitting method had to be used in the case of constrained spatial interaction.
Table 1: Descriptive Statistics: The Training, Validation and Testing Sets
Variables |
Mean |
Standard Deviation |
Minimum |
Maximum |
Whole Set M | ||||
sj |
26,364,563 |
50,350,660 |
2,310,400 |
285,193,984 |
dij |
229.4 |
124.6 |
30.0 |
630.0 |
t |
8.6 |
22.6 |
0.0 |
257.9 |
ti .________________________________ |
266.0 |
350.1 |
41.9 |
1830.1 |
Training Set M1 | ||||
sj |
26,142,923 |
49,711,907 |
2,310,400 |
285,193,984 |
dij |
234.1 |
129.6 |
35.0 |
630.0 |
tij |
9.6 |
26.2 |
0.0 |
257.9 |
ti.________________________________ |
297.0 |
429.1 |
41.9 |
1830.1 |
Validation Set M2 | ||||
Sj |
26,517,946 |
50,891,071 |
2,310,400 |
285,193,984 |
d |
219.3 |
121.4 |
30.0 |
590.0 |
t tij |
7.1 |
16.6 |
0.0 |
166.8 |
ti. |
220.9 |
221.4 |
45.6 |
759.8 |
Testing Set M3 | ||||
Sj |
26,654,459 |
51,069,577 |
2,310,400 |
285,193,984 |
Dij |
230.3 |
116.7 |
37.0 |
627.0 |
t |
8.0 |
19.7 |
0.0 |
195.2 |
ti. |
249.0 |
262.5 |
55.3 |
895.7 |
Note: M consists of 992 patterns, M1 of 496 patterns, M2 of 248 patterns and M3 of 248
patterns.
23
More intriguing information
1. HOW WILL PRODUCTION, MARKETING, AND CONSUMPTION BE COORDINATED? FROM A FARM ORGANIZATION VIEWPOINT2. A Regional Core, Adjacent, Periphery Model for National Economic Geography Analysis
3. A Note on Productivity Change in European Co-operative Banks: The Luenberger Indicator Approach
4. Dynamic Explanations of Industry Structure and Performance
5. The name is absent
6. The name is absent
7. Langfristige Wachstumsaussichten der ukrainischen Wirtschaft : Potenziale und Barrieren
8. Electricity output in Spain: Economic analysis of the activity after liberalization
9. Business Cycle Dynamics of a New Keynesian Overlapping Generations Model with Progressive Income Taxation
10. The name is absent