Stata Technical Bulletin
STB-48
. sumdist eybhc [fw=wgt]
Warning: eybhc has 20 values < 0. Used in calculations
Distributional summary statistics, 10 quantile groups
— Quantile |
-+— I |
Quantile |
7. of median |
Share, 7. |
L(p), 7. |
— GL(p) |
— |
— | |||||
1 |
I |
92.25 |
47.44 |
2.94 |
2.94 |
6.85 |
2 |
I |
115.77 |
59.54 |
4.47 |
7.41 |
17.26 |
3 |
I |
141.27 |
72.65 |
5.49 |
12.90 |
30.05 |
4 |
I |
167.22 |
86.00 |
6.61 |
19.50 |
45.44 |
ε |
I |
194.45 |
100.00 |
7.76 |
27.26 |
63.53 |
6 |
I |
225.38 |
115.91 |
9.04 |
36.30 |
84.59 |
7 |
I |
263.34 |
135.43 |
10.44 |
46.75 |
108.93 |
S |
I |
315.39 |
162.20 |
12.38 |
59.13 |
137.78 |
9 |
I |
402.21 |
206.85 |
15.20 |
74.33 |
173.20 |
10 — |
I |
25.67 |
100.00 |
233.02 |
Share = quantile group share of total eybhc;
L(p)=Cumulative group share; GL(p)=L(p)*mean(eybhc)
We now have estimates of the nine deciles (p10,p20,p30,..., p90) splitting the population into tenths ordered by income
(decile groups): look at the Quantile column. The next column shows that p10 is about 47% of the median income (= p50).
We can also see from the Share column that the poorest tenth of the UK population in 1991 received less than 3% of total
income whereas the richest tenth received more than 25% of total income.
The L(p) column shows cumulative quantile group income shares, in other words, Lorenz ordinates. Lorenz curves are
graphs connecting a plot of these points against cumulative population shares, and are often used for inequality summaries
and inequality “dominance” comparisons (see e.g., Cowell 1995, Lambert 1993). The GL(p) column shows the values of L(p)
multiplied by mean income. The generalized Lorenz curve is the Lorenz curve scaled up at each point by mean income, and is
often used for “welfare” dominance comparisons (Cowell 1995, Lambert 1993). sumdist is designed to provide a numerical
summary of these distributional features, rather than provide the data elements for drawing (generalized) Lorenz curve graphs.
After all, if one has unit record data (as here), one might as well draw the graphs using all the data; see Jenkins and Van Kerm
(1999).
If instead we had typed
. sumdist eybhc [fw=wgt], n(5) qgp(quintgp)
the program would have provided the four quartiles (p20,p40,p60,p80) splitting the population into fifths ordered by income,
quintile group income shares etc., and created a new variable quintgp recording quintile group membership.
xfrac: tabulation using categories defined by fractions of a cut-off value
xfrac provides a specialized tabulation (a “wrapper” for tabulate). Each valid observation is first partitioned by varname
into one of a set of 20 mutually-exclusive categories, the boundaries of which are defined by “hard-wired” fractions of a
user-specified cut-off value (in the same units as varname), with fractions ranging from 0.1 through to 3.0. This classification is
then tabulated and, optionally, can be retained as a new variable.
An example may clarify. Let varname be a measure of income and the cut-off be mean income. xfrac shows the proportion
of observations with varname value less than 10% of mean income, between 10% and 20% of mean income, between 20%
and 30% of mean income, and so on (20 categories). Cumulative proportions are also shown. The hard-wired fractions of the
cut-off were chosen to match those used in the presentation of the UK official low income statistics (see, e.g., Department of
Social Security, 1993). Motivated users could easily modify the xfrac code and change the choices if desired.
In effect xfrac provides a discrete representation of the distribution function for varname.
Syntax
xfrac varname [weight] [if exp] [in range] , cutoff(#) [gp(gpname)]
fweights and aweights are allowed.
The user must specify a value for the cut-off value in the same units as varname using cut off (#).