Stata Technical Bulletin
25
Locating the first occurrence of an event
first occ is a utility to identify the first occurrence of an event for each individual in an ordered sequence of observations
and (optionally) to store the corresponding observations in a new file. The syntax of f irstocc is
firstocc ident-var rankjvar = exp [ if exp] [ in range] , { generate (new~var) ∣ saving (newfile) ʃ
Either the generate() or the saving() option must be specified. If the generate() option is chosen, firstocc generates a
0/1 variable, newjvar, that is equal to ‘1’ when the = exp is true for the first time in a sequence of ordered observations. When
the saving() option is chosen, firstocc creates a new file that contains only the observations where newjvar would be equal
to ‘1’.
Using the same example data set, we can use firstocc to retain only the observations on the first period of labor market
activity. The ranking (ordering) variable in this example is the individual’s age:
. use example, clear
. firstocc id age = activity==l ∣ |
activity==2, |
saving (j obi) |
. list id sex activity age lost | ||
id sex activity |
age |
lost |
1. 1 Male self emp |
27 |
in |
2. 3 Female selfemp |
22 |
in |
3. 4 Male self emp |
23 |
out |
4. 5 Male waged |
11 |
in |
5. 6 Female waged |
27 |
out |
Make sure your original file is saved before running firstocc with saving() option, because firstocc will delete
observations from your original data set without asking you for any confirmation.
Creating multiple observations for fixed-time transitions
slice is used in survival analysis to add observations for times or ages that may not have been directly observed. In the
example data set used above, an observation is added only when an individual changes job status. If an individual is recorded as
‘nonactiv’ at age 20 and ‘self emp’ at age 30, the individual’s states at ages 21-29 can be inferred, but they are not recorded
in the data set. slice fills in observations for any desired age or time interval, making it easy to analyze the distribution of
states for any arbitrary interval. This explanation of slice may sound a bit convoluted, but the operation of slice is easily
understood from an example or two.
The syntax of slice is
slice timevar [ if exp] [ in range] , generate (new-var) interval (interval)
siernamejlejiameV tvid.(.varname') [ nolabel start Vaaramme) ]
timevar specifies the time at the end of the period represented by each observation. Observations with missing timevar are
ignored and are stored after the others for each individual. Note that the number of observations added to the original data can
be substantial and is dependent on the number of periods that individuals pass through. You may need to repartition memory
with the memsize command before using slice.
To illustrate slice, suppose we are interested in studying the job status of individuals at different ages, using the example
data set introduced above. For example, we can examine the slice of the data corresponding to age 26 by typing
. use example, clear
. slice age, tvid(id) interval(25,26) saving(new26, replace) generate(grage)
-25 + 4
[25-26 + 4
[26- --------
8 records added to 20
This command added eight observations to the original twenty and stored a copy of the result in new26.dta. slice also added
a new variable, grage, to the data. grage is coded as ‘0’ when the individual’s age is less than or equal to 25, as ‘1’ when
the age is exactly 26, and as ‘2’ when the age is greater than 26. grage has an attached value label in the following list:
. list id activity age birth year grage