30
Stata Technical Bulletin
STB-22
As a consequence, we strongly recommend that you retain the previous version of the time series library. In the event that a
time series command behaves in an unexpected manner, simply revert to the previous version. In subsequent releases, the panel
data features will be extended to more and more of the programs in the time series library. When this extension becomes fully
integrated into the library, the older version can be erased.
Command to define cross-sectional units: The csunits command specifies the variables that identify cross-sectional units.
The syntax is
csunits varlist [, clear ]
csunits is the cross-sectional analog of the datevars command. The clear option is a convenience feature; it “erases”
the existing definition. csunits is illustrated in the example that follows the discussion of the lag command.
Generalization of the lag command: la g has been extended to handle panel data correctly. When a variable is lagged
(lead), missing values are created at the beginning (end) of the time series. In panel data, the time series restart for each
cross-sectional unit. If the cross-sectional units have been defined by the csunits command, lag will operate on each
cross-sectional unit independently, similar to the way the b y varlist: prefix operates on each by-group independently.
lag is the most heavily-used command in the time series library. Almost every other program in the library calls lag. Thus,
this extension to lag may affect other programs in the library in unexpected ways. For example, tsreg and tsfit should
now handle panel data appropriately. Both these routines also call findsmpl to report sample coverage. But findsmpl
does not handle panel data, thus the information on sample coverage should be suppressed or ignored. More importantly, the
time series features of regdiag, such as the Durbin-Watson statistic, do not handle panel data correctly yet. None of these
side effects are relevant unless you use panel data and you identify the cross-sectional units with the csunits command.
Here is a simple, artificial example that illustrates csunits and the new behavior of lag. We have observations on three
cross-sectional units defined by the variable id. We observe unit 100 from period 1 through period 5, unit 101 from period 3
through period 7, and unit 105 from period 2 through period 4. If no time series or cross section information is specified, lag
operates as before:
• lag ɪ
. list
id |
time |
x |
L.x | |
1. |
100 |
1 |
3.21 | |
2. |
100 |
2 |
67.10 |
3.21 |
3. |
100 |
3 |
98.30 |
67.1 |
4. |
100 |
4 |
62.96 |
98.3 |
ε. |
100 |
ε |
24.59 |
62.96 |
6. |
101 |
3 |
89.84 |
24.59 |
7. |
101 |
4 |
33.59 |
89.84 |
S. |
101 |
ε |
4.07 |
33.59 |
9. |
101 |
6 |
31.31 |
4.07 |
10. |
101 |
7 |
78.12 |
31.31 |
11. |
105 |
2 |
94.58 |
78.12 |
12. |
105 |
3 |
8.63 |
94.58 |
13. |
105 |
4 |
89.53 |
8.63 |
lag assumes the data are in the appropriate order when it creates L.x. Note that lag does not respect the boundaries of the
cross-sectional units. For example, L.x is 24.59 in observation 6, the first observation on unit 101, but 24.59 is the value of x
in the last observation on unit 100.
If time series information is recorded by period and datevars, the new version of lag will sort the data before generating
leads and lags:
. period 1
1 (annual)
. datevars time
i lag x
(note: L.x replaced)
. list
id |
time |
x |
L.x | |
1. |
100 |
1 |
3.21 | |
2. |
105 |
2 |
94.58 |
3.21 |
3. |
100 |
2 |
67.10 |
94.58 |
4. |
100 |
3 |
98.30 |
67.1 |
5. |
105 |
3 |
8.63 |
98.3 |