to each specific income group. The relation can
be represented as:
n
(1) E = ∑ [a1 + bi(Y-Yi)]Si + U,
i=l
i = 1,2,...,n
where E and Y represent household food expen-
diture and income, respectively. Y is to be di-
vided into n segments, where Y1,...,Yn+1 defines
the n+1 points, called knots. Si is a dummy vari-
able whose value is 1 for all observations, such
that Yi≤Y<Yi+1, and is 0, otherwise. U is a ran-
dom disturbance associated with E.
In general, equation (1) allows discontinuity at
each Y1. In addition, a curvilinear relationship is
generally considered more appropriate for the
income-expenditure relation than the linear ap-
proximation. Spline functions overcome these
limitations by replacing the linear formulation of
equation (1) with polynomial approximations.
However, the number and position of knots and
the degrees of the polynomial pieces may vary in
different situations and are the major difficulties
confronted in estimating spline functions.
If each knot is defined as a variable, its posi-
tion must be estimated and entered into the re-
gression problem in a nonlinear fashion, and all
the problems arising in nonlinear regression are
present.1 Although some research in this direc-
tion has been done (Bellman and Roth; Gallant
and Fuller; McGee and Carleton), the use of
variable-knot splines requires very large amounts
of computation to find knot locations that give an
absolute minimum for the residual sum of
squares, and the testing Ofhypotheses is virtually
impossible (Smith). However, spline function es-
timation with fixed knots is straight forward,
using standard regression procedures (e.g.,
Barth, Kraft and Kraft; Suits, Mason and Chan;
Poirier; Smith).
With respect to the degrees of the polynomial
pieces, there is no a priori basis for the determi-
nation of a specific degree. However, a spline
function with polynomials of degree three; that
is, cubic splines, is the most common form used
in practice. In general, cubic splines are used be-
cause they are of low degree; fairly smooth, as-
suming continuity restrictions up to the second
derivative; and yet have the power to improve
significantly the fit, as well as a higher degree of
polynomial.
In this study, the cubic spline function with
fixed knots is assumed and the range of house-
hold income was divided into n segments. Equa-
tion (1) now becomes
n
(2) E = ∑ [ai + bi(Y-Y1) + C1(Y-Y1)2 +
i= 1
di(Y-Yi)3]Si + U.
To ensure that equation (2) is continuous at
each knot, constraints on the coefficients are re-
quired. These constraints make the function con-
tinuous and guarantee continuity of the first and
second derivatives. Thus,
(3) ai = ai.1 + b1-1(Yi-Yi-1) + ci-1(Yi-Yi.1)2 +
di-1(Yi-Yi-1)∖
bi = bi 1 + 2ci-,(Y,-Yi 1) + 3di√Yi-Yi-l)2,
c1 = ci.1 + 3d1-1(Y1-Y1.1), i = 2,3,...,n.
By substituting (3) into equation (2), and col-
lecting terms with the same coefficient, equation
(2) can be expressed as
n n
E = a1 ∑ S1 + b1(Y-Y1) ∑ S1 +
i=l i=l
n n
C1(Y-Y1)2 Σ Sj + d1(Y-Y1)3 ∑ S1 +
i=l i=l
n n
∑ [(d1-d1-1) (Y-Y1)3 ∑ Sj] + U,
i=2 j=i
or
(4) E = a1 + b1(Y-Y1) + c1(Y-Y1)2 +
d1(Y-Y1)3 +
n
∑ (di-di J (Y-Yi)3Si l + U,
i = 2
where Si-1 is a new set of dummy variables, such
that Si-1=1, if and only if Y⅛Y1, otherwise
Si~1=0.
Given the basic formulation of equation (4),
the model can be generalized to fit a spline func-
tion that involves more than one independent
variable (Suits, Mason, and Chan). This analysis
incorporates the additional variable Ofhousehold
size in the same manner as the income variable in
the regression.2 Hence, m segments of household
size within the sample range were established
and added to equation (4).3 The final estimating
1 Wold argues that the choice of knot positions in a spline function can be viewed as analogous to the specification of functional form in a traditional curve fitting problem.
Hence, the knots should be chosen to correspond to the overall behavior of the data than be considered as parameters.
- A potential difficulty with this formulation may arise because household size is a discrete variable. This suggests that the scatter of observations is distributed as isolated
groups, with gaps between each household size instead of scattered throughout the observed range. Thus, a spline function for a discrete variable is less restrictive because it
is freer to move through the sparse parts of the data, as compared with a continuous variable. Consequently, it may lead to spurious curvature. However, judging from the
results obtained in the study, this does not seem to be the case. The potential pitfall of creating spurious curvature in the case of a discrete variable may be reduced if the
knots are kept at a minimum number, or if the entire observed range is used, so that the scatter of observations can still exert discipline over the curvature of the function.
3 Ifone expects that a change in household income affects household size and/or vice versa, then it would be appropriate to include an additional variable in the model to
account for possible interaction effect between household income and size. Preliminary investigations of the sample data suggest that little relationship exists between
household income and size ( r≈0.07). Therefore, it seems reasonable to assume that household income and size are independent in the formulation of the model. Furthermore,
the data indicate that household income and size are significantly correlated with the income-size interaction (r=0.80 and 0.56, respectively). The addition of an interaction
variable would likely introduce problems of multicollinearity to the statistical model, and, hence, reduce the reliability of the results.
106