Stata Technical Bulletin
11
Stata would have problems, however, if the same data arrangement appeared in a dictionary file:
. type ini. det
dictionary {
int xl
int x2
int x3
}
11 12 13
21 22
23 31 32 33
. infile using ini
dictionary
int xl
int x2
int x3
(4 observations read)
list |
xl |
x2 |
x3 |
1. |
11 |
12 |
13 |
2. |
21 |
22 | |
3. | |||
4. |
23 |
31 |
32 |
Stata’s dictionary files are the preferred form for storing and documenting raw data. The dictionary subcommands can handle
most kinds of formatted data including multi-line records and data sets without carriage returns ([5d] infile). Nonetheless, Murphy’s
law guarantees that you will occasionally confront data sets that confound Stata’s dictionary capabilities. More commonly, you
will have a data set that Stata’s dictionary features can handle but only with difficulty. Clearly, life would be simpler if all raw
data sets were rectangular, as in the first example.
I have written a C program called block that makes my life simpler. block takes an arbitrary ASCII file as input and
produces as output the same information arrayed rectangularly. The following example illustrates how to use block.
C:> type ini
11 12 13
21 22
23 31 32 33
C:> block
Name of the input file: ini
Name of the output file: outl
Number of columns: 3
Read in 9 fields from ini
Wrote out 3 rows of 3 columns to outl
C:> type outl
11 12 13
21 22 23
31 32 33
block handles non-rectangular data gracefully:
C:> type in2
11 12 13
21 22
23 31 32 33 44
C:> block
Name of the input file: in2
Name of the output file: out2
Number of columns: 3
Read in 10 fields from in2
Wrote out 3 rows of 3 columns to out2
WARNING: last row not complete.
C:> type out2
11 12 13
21 22 23
31 32 33
44