24
Stata Technical Bulletin
STB-48
607. |
223.64366 |
0.36338 |
Std. Dev. |
188.66945 |
707. |
259.48672 |
0.46592 | ||
757. |
282.24383 |
0.52353 |
Variance |
35596.15976 |
807. |
310.64428 |
0.58654 |
Half CV^2 |
0.32290 |
907. |
406.43660 |
0.73643 |
Gini coeff. |
0.33721 |
957. |
520.03530 |
0.83332 |
p90∕pl0 |
4.50111 |
997. |
894.92777 |
0.94313 |
p75∕p25 |
2.11675 |
The likelihood values and estimates of the percentiles, inequality indices and other distribution parameters are remarkably
similar for both models.
All the estimates are also very similar to their nonparametric counterparts. For example, the nonparametric estimate of the
Gini coefficient is 0.333 and of the GE(2) index (half the squared coefficient of variation), 0.362: see the output from ineqdeco
in Jenkins (1999). Other nonparametric statistics can be derived by summary, detail:
. summarize eybhc [fw=wgt] if eybhc>0, detail
Equiv. net income BHC | ||||
— |
Percentiles |
Smallest |
— | |
Γ/. |
41.10482 |
.0076653 | ||
57. |
79.116 |
1.938724 | ||
107. |
92.79689 |
2.631398 |
Obs |
55687900 |
257. |
127.8417 |
2.808512 |
Sum of Wgt. |
55687900 |
507. |
195.036 |
Mean |
233.7762 | |
757. |
287.5094 |
Largest |
Std. Dev. |
198.8109 |
907. |
402.397 |
2013.499 |
Variance |
39525.79 |
957. |
504.1051 |
3024.663 |
Skewness |
14.44232 |
997. |
818.264 |
7740.044 |
Kurtosis |
484.1126 |
The greatest difference between the parametric and nonparametric estimates is at the very bottom and, especially, the very
top of the distribution. The latter difference is almost certainly due to the presence of a single high income outlier; note for
example the large under-estimation of the top-sensitive index GE(2) = half the squared coefficient of variation. In some cases,
one might argue that the parametric estimates were more reliable on the grounds that income data in the extreme tails of the
distribution are not reliable.
Goodness-of-fit may also be assessed graphically using probability plots. The psm, qsm, pdagum, and qdagum programs
written by Cox (1999) provide these using estimates produced by smfit and dagumfit.
The similarity of estimates in the example appears contrary to the claim sometimes made in the literature that the Dagum
distribution typically provides a better fit than the Singh-Maddala one. Results can perhaps be reconciled by observing that in
virtually all cases reported to date, estimates have been derived from grouped (banded) income data rather than unit record data
as here.
Other criteria besides goodness-of-fit may be relevant to a choice between smfit and dagumfit. The main difference I
have found is in convergence stability and time. In all the applications I have experimented with, smf it has converged quickly
in only a few iterations from the default starting values. By contrast, dagumf it typically took many more iterations and in
fact sometimes failed to converge using the default starting values (try fitting the Dagum distribution to the variable price in
auto.dta). In the illustration shown above, smfit took about a minute to converge using a Pentium P1/166 PC running Stata 5.0
for Windows 95, but dagumfit required almost 18 minutes. Part of the problem is that it is difficult to specify good default
starting values for dagumfit. In all the cases where the program did not converge, experimentation with a range of alternative
starting values led eventually to convergence. Use of the trace option is therefore recommended in all initial fits.
Acknowledgments
This work forms part of the scientific research program of the Institute for Social and Economic Research, and was supported
by core funding from the University of Essex and the UK Economic and Social and Economic Research Council. The programs
are revisions and extensions of some presented at the 4th UK Stata Users’ Group meeting. Markus Jantti and Nick Cox made
helpful comments on earlier versions of the programs.
References
Cox, N. J. 1999. gr35: Diagnostic plots for assessing Singh-Maddala and Dagum distributions fitted by MLE. Stata Technical Bulletin 48: 2-4.
Dagum, C. 1977. A new model of personal income distribution: specification and estimation. Economie Appliquée 30: 413-437.
--. 1980. The generation and distribution of income, the Lorenz curve and the Gini ratio. Economie Appliquée 33: 327-367.
Jenkins, S. P. 1999. sg104: Analysis of income distributions. Stata Technical Bulletin 48: 4-18.