Stata Technical Bulletin
45
mean mean over interval
sum sum over interval
gmean geometric mean over interval (for a positive variable)
first first observation in the interval
last last observation in the interval
If stat is not specified, mean is assumed.
The tscollap command is for use with time series data of monthly, quarterly, or half-yearly frequency. You must tsset
your data before using this command; see [U] tsset. If the data are a panel of time series, that is, if a panelvar has been specified
in tsset, the specification of the panel identification variable will automatically be retained in the resulting dataset (it need not,
and should not, be specified in the varlist). Time series operators may not be used.
The tsmktim command (described in dm81; see pages 2-4) is a convenient way to generate the appropriate tsset command
if you do not already have a time variable in the data.
Options
toffreq') specifies the target frequency, which must be specified. It may take on any value lower than the current value as
understood by tsset. freq must be given as q, h, y in either lowercase or uppercase.
generate (freqvar) may be used to specify the name of the new tsset variable, which will be formatted at the target frequency.
Description
tscollap converts the time series data in memory into a dataset of means, sums, or selected values taken from the specified
interval. It is a variant of collapse, which automatically forms the groups over which statistics are to be calculated from an
understanding of the calendar data. For instance, monthly data may be converted to quarterly, half-yearly, or annual (yearly)
data by specifying to(q), to(h), or to(y), respectively. Data may be averaged over the interval (using either an arithmetic
or geometric mean) or summed (as would be appropriate for income statement data). Either the first or the last observation
of each interval may be selected (so that, e.g., end-of-period values may be readily assembled). Since its syntax (and internal
logic) is taken from collapse, more than one statistic may be generated from a single variable; for example, both average and
end-of-period values may be specified by using different target-var names). tscollap embodies the June, 2000 correction to
collapse.
All variables not specified in the target list are dropped (including the current tsset variable), and a new tsset variable
is generated as freqfreq (as long as that variable does not already exist). The generate option may be used to customize the
new tsset variable. If a panelid variable is in use by tsset, it should not be listed; it will be automatically retained.
Saved results
tscollap saves the items returned by tsset in r().
Remarks
tscollap makes substantial use of _gfilter, part of the egenmore package of N. J. Cox, information about which is
available by issuing the command webseek egenmore.
Examples
Monthly data from Terence Mills’ Econometric Analysis of Financial Time Series on UK FTA All Share stock prices (ftap)
and dividends (ftadiv) are compacted to a quarterly frequency.
. use http ://fmwww.be.edu∕ec-p∕data∕Mills2d∕fta.dta
. tscollap ftap ftadiv, to(q)
Converting from M to Q
time variable: q_q, 1965ql to 1995q4
. use http ://fmwww.be.edu∕ec-p∕data∕Mills2d∕fta.dta
. tscollap ftap (first) ftapf=ftap (last) ftapl=ftapt to(q) gen(qtr)
Converting from M to Q
time variable: qtr, 1965ql to 1995q4
In the first instance, the price and dividend series are averaged over the quarter, and other series in the original dataset discarded.
In the second example, the price series is used to generate three variables: the average price per quarter, the price in the first
month of each quarter, and the price in the last month of each quarter, as ftap, ftapf, and ftapl, respectively.