Stata Technical Bulletin
sumdist: distribution summary statistics, by quantile group
sumdist estimates distributional summary statistics commonly used by income distribution analysts, complementing those
available via pctile, xtile, and summarize, detail. In fact much of sumdist is a “wrapper” for xtile, combined with
tabdisp to display the results of by-group calculations.
For variable x and distribution function F(x), the statistics provided are
(1) quantiles к = 1, 2,..., m — 1, for m =# quantile groups;
(2) the quantiles expressed as a percentage of medianCr);
(3) the quantile group share of x in total x (group income share, %);
(4) the cumulative quantile group shares of total x (with cumulation in ascending order of .r), i.e., the Lorenz ordinates Z(p)
at each p⅛ = F{xU for quantile points a⅛; and
(5) the generalized Lorenz ordinates at each p⅛ = F(a⅛), i.e., GL(p⅛) = mean(æ) * L(j>k).
Syntax
sumdist varname [weight] [if exp] [in range] [, ngps(#) qgp(.gpname^)]
fweights and aweights are allowed.
Options
ngps(#) specifies the number of quantile groups. Valid values are integers in the range (0,100]. The default is 10.
qgp gppamee^) creates a new categorical variable, gpname, containing categories summarizing quantile group membership, with
the number of categories equal to m.
Example
We shall follow a conventional approach and examine the distribution of income amongst all persons in the population,
assuming that each person receives the needs-adjusted income of the household to which s/he belongs. Thus we focus on the
distribution of the variable eybhc weighted by wgt.
A summarize, detail shows some standard features of income distributions, namely significant dispersion combined with
skewness: the mean is well above the median, and there is a long upper tail. (A more sophisticated analysis might consider the
sensitivity of conclusions to differing treatments of the “outlier” largest income.)
. summarize eybhc [fw=wgt], de
Equiv. net income BHC | ||||
— |
Percentiles |
Smallest |
— | |
Γ/. |
29.04 |
-123.9898 | ||
57. |
78.43056 |
-72.37004 | ||
107. |
92.24828 |
-42.89144 |
Obs |
55851705 |
257. |
127.3008 |
-42.70588 |
Sum of Wgt. |
55851705 |
507. |
194.4472 |
Mean |
233.0179 | |
Largest |
Std. Dev. |
199.0178 | ||
757. |
287.2739 |
1846.438 | ||
907. |
402.212 |
2013.499 |
Variance |
39608.08 |
957. |
503.1029 |
3024.663 |
Skewness |
14.35982 |
997. |
818.264 |
7740.044 |
Kurtosis |
480.917 |
Observe the |
presence of negative and zero |
incomes in the |
data. It is up to the user to decide how to handle these. In |
general there may be arguments for or against exclusion of them, which vary with circumstances. By default sumdist retains
these values, but they can be excluded using the if option. An example of default output is as follows: