Stata Technical Bulletin
The program is not smart. It looks to the first non-separator character on the first line and decides whether the first line is a
set of labels or not. According to the logic of the program, if that first character is a double-quote, the first line contains labels.
If not, the first line just contains data. Not smart, but very simple. If there are not enough labels (which includes the no-label
case), sep2stata puts in its own, calling them labeln, where n is the number of the column associated with the label (e.g.,
label47).
If you have varying numbers of values on each line, sep2stata assumes that the longest line (as defined by number of
values) in the dataset has the right number of variables, while any shorter ones are missing data.
If the data contains strings, the program will notice it. If a column contains greater than 50% strings, the software will
declare the variable a string (in the dictionary) of sufficient width to handle the widest entry in the column. The logic isn’t
perfect, but works most of the time (you may want to fine-tune it for your datasets). Primarily, we use it to handle data exported
by Lotus 1-2-3. If you are interested in how we do that, contact me.
Finally, to round out the set of software, a short do-file contains the commands to do all this processing without exiting
Stata. The commands contained in readsep.do will cause Stata to use sep2stata and read in the resulting dataset.
My apologies for the crudeness of the algorithms and the lack of features in general.
Examples
Typing ‘do re adsep input filename ’ in Stata imports data from a comma-separated file into Stata.
Typing ‘sep2s tata input filename > outputfilename .d ct’ in Unix converts a comma-separated file into a Stata-readable
file.
Typing ‘do s2sep outputfilename’ in Stata exports the current dataset from Stata.
Typing ‘stata 2sep log filename datafilename ’ in Unix converts the log and data files individually (you should never need
to do this, but this is how things work internally) into a single comma-separated file.
sed7.1 Resistant nonlinear smoothing using Stata
William Gould, CRC, FAX 310-393-7551
Salgado-Ugarte and Garcia (1992) presented an implementation of the 4253EH smoother for Stata, an example of a nonlinear,
robust smoother as originally defined by Tukey (1977) and further developed by Velleman (1977). In the Tukey-Velleman notation,
4253EH means the smoother proceeds by first taking running medians of span 4, then smooths that result using running medians
of span 2, followed by running medians of span 5 and span 3, applies an end-point correction, and finally smooths the overall
result with a Hanning linear smoother. Thus, the notation allows description of other smoothers and there is a literature discussing
the relative merits of the various smoothers that can be described in the notation (Velleman 1980).
Below, I implement nonlinear, robust smoothers more generally, allowing the smoother to be specified in the Tukey-Velleman
notation. The syntax of the nlsm command is
nlsm compoundsmoother[,twice] varname, generate (newvar) [noshift]
where compound^smoother is S'[S'...] and S is
{11 2∣4∣ 5∣ 6∣7 ∣8∣ 9}[R]
3[R]S[S∣R][S∣R]...
E
H
Examples of Compoundjsmoother include 3RSSH; 3RSSH,twice; 4253H; 4253H,twice; or 43RSR2H,twice. nlsm, unlike most
Stata commands, does not distinguish between upper and lowercase letters, so 4253h is equivalent to 4253H.
Options
generate(newvar) is not optional; it specifies the name of the new variable to be created.
noshift prevents shifting the data one time period forward for each pair of even smoothers applied. This option is useful only
if subsequent smoothers are to be applied in separate commands and, in that case, making required shifts of the data is the
user’s responsibility.