992 samples into three subsets12: the training [in-sample] set
M1 = Kxu 1,yu 1 j withu1 =1,...,U1 = 496patterns}, the internal validation set
M2 = {(xu2,yu2 ) with u2 = 1,...,U2 = 248 patterns} and the testing [prediction, out-of-
sample] set M3 = ∣(xu3,yu3) with u3 = 1,...,U3 = 248 patterns}. M1 is used only for
parameter estimation, while M2 for validation. The generalisation performance of the
model is assessed on the testing set M3 .
Though the simplicity of this method is appealing, an obvious concern is the necessary
reduction in the amount of training data. In deciding how to partition the data, a
compromise has been made between creating a test set large enough to fully test the
fitted model while still retaining a sufficient amount of training and internal validation
data. If the test set is too small then the variance of the prediction error estimate will be
high due to the small sample size. Though random splits are commonly used and appear
to work reasonably well in the case of unconstrained spatial interaction, a more
systematic splitting method had to be used in the case of constrained spatial interaction.
Table 1: Descriptive Statistics: The Training, Validation and Testing Sets
Variables |
Mean |
Standard Deviation |
Minimum |
Maximum |
Whole Set M | ||||
sj |
26,364,563 |
50,350,660 |
2,310,400 |
285,193,984 |
dij |
229.4 |
124.6 |
30.0 |
630.0 |
t |
8.6 |
22.6 |
0.0 |
257.9 |
ti .________________________________ |
266.0 |
350.1 |
41.9 |
1830.1 |
Training Set M1 | ||||
sj |
26,142,923 |
49,711,907 |
2,310,400 |
285,193,984 |
dij |
234.1 |
129.6 |
35.0 |
630.0 |
tij |
9.6 |
26.2 |
0.0 |
257.9 |
ti.________________________________ |
297.0 |
429.1 |
41.9 |
1830.1 |
Validation Set M2 | ||||
Sj |
26,517,946 |
50,891,071 |
2,310,400 |
285,193,984 |
d |
219.3 |
121.4 |
30.0 |
590.0 |
t tij |
7.1 |
16.6 |
0.0 |
166.8 |
ti. |
220.9 |
221.4 |
45.6 |
759.8 |
Testing Set M3 | ||||
Sj |
26,654,459 |
51,069,577 |
2,310,400 |
285,193,984 |
Dij |
230.3 |
116.7 |
37.0 |
627.0 |
t |
8.0 |
19.7 |
0.0 |
195.2 |
ti. |
249.0 |
262.5 |
55.3 |
895.7 |
Note: M consists of 992 patterns, M1 of 496 patterns, M2 of 248 patterns and M3 of 248
patterns.
23
More intriguing information
1. Olive Tree Farming in Jaen: Situation With the New Cap and Comparison With the Province Income Per Capita.2. Cross-Country Evidence on the Link between the Level of Infrastructure and Capital Inflows
3. Modeling industrial location decisions in U.S. counties
4. DEMAND FOR MEAT AND FISH PRODUCTS IN KOREA
5. Developments and Development Directions of Electronic Trade Platforms in US and European Agri-Food Markets: Impact on Sector Organization
6. Before and After the Hartz Reforms: The Performance of Active Labour Market Policy in Germany
7. The name is absent
8. THE WELFARE EFFECTS OF CONSUMING A CANCER PREVENTION DIET
9. The Employment Impact of Differences in Dmand and Production
10. Examining Variations of Prominent Features in Genre Classification