Stata Technical Bulletin
Include a help file for each ado-file in your submission. Help files are plain ASCII files with the same filename as the
associated ado-file and with the extension .hlp. All help files should adhere to the standard Stata format. The easiest way to
determine this format is to examine one of the help files delivered with Stata or one of the help files on the STB distribution
diskette. The caret (“~”) is used in help files to turn highlighting on and off.
Examples should be included with each ado-file, along with logs of their results, so we can confirm that the copies of the
ado-files we receive perform as described in the insert. Any examples in the text should be supplied as do-files, along with any
data used in the examples. These data sets are typically supplied on the distribution diskette to permit readers to replicate the
examples. Avoid using confidential or proprietary data in your examples or any other data that cannot be supplied freely to STB
readers.
If there are Stata graphs in your insert, include a do-file that recreates the graphs. You may also supply .gph files, but these
are typically recreated for publication. If you must supply a .gph file that cannot be recreated, do not use the title() option
of graph to title the graph. Instead, indicate the title of the graph, and we will add the title during the publication process.
Program design
You are, of course, free to design your Stata programs as you see fit. The point of the STB is to communicate your good
ideas to other Stata users, particularly when these ideas are novel. However, there are a few simple guidelines of program design
that will make your program easier for others to understand and to use.
First, adopt standard Stata syntax if at all possible. Stata’s stripped-down syntax is one of its greatest strengths. Command
names are usually simple English verbs that describe the action to be taken (list, summarize, tabulate). The other components
of Stata syntax cover the contingencies: variable lists specify the objects of the action, the expression details any calculations
needed, the if and in clauses restrict the sample, the weight clause specifies the weight, and the options handle all other
contingencies. Stata users find it easier to learn new commands if the commands follow standard syntax. And Stata’s parse
command makes it easy for your program to rely on Stata’s extensive parsing and error-checking code, as long as you adopt
standard syntax.
Second, allow users to type the shortest unique abbreviations of option names. The parse command allows you to specify
the shortest acceptable abbreviation for each option. Don’t make users type display if there are no other options that begin
with the letter “d”. Destructive options (clear, replace, etc.) are exceptions to this guideline. Stata style requires such options
to be typed in full to avoid unintended modifications of users’ data.
Third, as with program names, use ordinary English words for option names, whenever possible. And design your program
to accept complete words, if the user chooses to type them. For example, don’t require users to type ‘disp’ without also letting
them type ‘display’ if they wish.
The goal throughout is to write your programs as clearly and understandably as you do your prose. Aim for clarity over
cleverness.
dm21 Bringing large data sets into memory
Robert M. Farmer, Alabama Quality Assurance Foundation Inc., 205-970-1600
One characteristic of Stata that frequently frustrates new users is the way Stata allocates memory. Stata sets aside a
“rectangle” of memory for data. The size of this rectangle defines the maximum numbers of observations (maxobs) and variables
(maxvar) allowed. The “area” of the rectangle is the ultimate limit on the size of the data set that can be handled, but, at any
given moment, data sets must also meet the maxobs and maxvar constraints. If a data set is small enough to fit, but has either
too many observations or variables for the current data rectangle, the Stata use command will fail to load the data set. For
example:
. describe
Contains data
Obs: 0 (max= 20434)
Vars: 0 (max= 99)
Width: 0 (max= 200)
Sorted by:
. describe using test