Stata Technical Bulletin
STB-4
meeting the optional if and in criteria are kept (sampled at 100%). If by ( ) is specified, a # percent sample is drawn within
each set of values of groupvars, thus maintaining the proportion of each group.
Sample sizes are calculated as the closest integer to (≠∕100)JV, where JV is the number of observations in the data or
group. Thus, a 10 percent sample of 52 observations will select 5 observations, while a 10 percent sample of 56 observations
will select 6. Note that a 10 percent sample of 4 or fewer observations selects nothing.
Sampling is defined as drawing observations without replacement. The previously released bootsamp (see ‘help bootsamp’)
will draw observations with replacement. If you are serious about drawing random samples, you must first set the random number
seed with the set seed command.
Say you have data on the characteristics of patients, You wish to draw a 10 percent sample of the data in memory. You
type ‘sample 10’.
Assume that among the variables is race. race==0 are whites and race==l are nonwhites. You wish to keep 100% of
the nonwhite patients but only 10% of the white patients. You type ‘sample 10 if race==0’.
If instead you wish to draw a 10% sample of white and a 10% sample of nonwhite patients, you type ‘sample 10,
by (race)’. This differs from typing simply ‘sample 10’ in that, with by(), sample holds constant the ratio of white to
nonwhite patients.
dm3 Automatic command logging for Stata
D. H. Judson, Dept. of Sociology, Washington State University
Although Stata’s interactive command language system is particularly useful for exploratory data analysis and instant
response, interesting analyses are often lost (or must be laboriously repeated) because the user forgets to log commands and/or
output. More importantly, sweeping data changes cannot be easily repeated, and such changes, at best, are dangerous. However,
the ability to rapidly generate new variables, predicted values, and the like is a useful one for many purposes. Thus, we are
faced with the problem: How do we perform analyses and data management while retaining a record of our work?
A simple solution is to revert to batch (ado) file programming, but this defeats the whole purpose of interactive and
exploratory data analysis. The solution, of course, is automatic logging. If log files can be generated automatically at the start
of a work session, the user never needs to worry that an analysis of data change cannot be remembered or repeated.
The following additions to the file profile.do accomplish automatic logging. They can be appended to the end of a
standard prof ile. do file. This implementation works in DOS and assumes that the logs are collected in the c : ∖logs subdirectory.
* AUTOMATIC LOG PROGRAM
capture program drop autolog
program define autolog
mac def .Name=I
mac def .retc=602
while %.retc==602 -{
cap log using c:\logs\auto'%_Name'.log, noproc
mac def .retc=.rc
mac def .Name=⅞.Name+l
>
mac drop .Name
mac drop .retc
end
autolog
program drop autolog
di in bl ,, log on...”
The commands perform the following steps:
1) The macro _Name is set to 1.
2) Stata attempts to create the log file c:\logs\autol.log. This log file uses the option noproc to log only the commands
entered at the keyboard, thus reducing its size.
3) If the file c:\logs\autol.log already exists, the program increments the macro JJame by 1 and repeats the process at
step 2 with the log file c:\logs\auto2.log.
4) This process continues until a file is created.
5) The program autolog is created and executed.
Note that legal DOS file names are eight characters long, so that log files can range from autol.log to auto9999.log.