The name is absent



Stata Technical Bulletin

presents a tabulation. In part, codebook makes this determination by the number of unique values of the variable. If the number
is 9 or fewer, codebook reports a tabulation, otherwise it reports summary statistics. tabulate(15) would change the rule to
produce tabulations whenever a variable takes on 15 or fewer unique values.

The mv option, which we specified above, asks codebook to search the data to determine the pattern of missing values.
This is a cpu-intensive task, which is the only reason that mv is an option. The result is useful. For instance, in the case of the
last variable tempjuly, codebook reported that every time tempjan is missing, tempjuly is missing and vice-versa. Looking
back up the output to the cooldd variable, codebook also reports that the pattern of missing values is the same for cooldd
and heatdd. In both cases, the correspondence is indicated with “<->”.

For cooldd, codebook also states that “tempjan==. —> cooldd==.”. The one-way arrow means that a missing tempjan
value implies a missing cooldd value, but a missing cooldd value does not necessarily imply a missing tempjan value.

codebook has some other features worth mentioning. When codebook determines that neither a tabulation nor summary
statistics are appropriate, for instance, in the case of a string variable or in the case of a numeric variable taking on many values
all of which are labeled, it reports a few examples instead. In the example above, codebook did that for the variable name.
codebook is also on the lookout for common errors you might make in dealing with the data. In the case of string variables,
this includes leading, embedded, and trailing blanks. codebook informed us that name includes embedded blanks. If name ever
had leading or trailing blanks, it would have mentioned that, too.

Another feature of codebook—this one for numeric variables—is to determine the units of the variable. For instance,
tempjan and tempjuly both have units of .1, meaning that temperature is recorded to tenths. codebook handles precision
considerations (note that tempjan and tempjuly are floats) in making this determination. If we had a variable in our data
recorded in 100s (e.g., 21,500, 36,800, etc.), codebook would have reported the units as 100. If we had a variable that took on
only values divisible by 5 (5, 10, 15, etc.), codebook would have reported the units as 5.

codebook, without arguments, is most usefully combined with log to produce a printed listing for enclosure in a notebook
documenting the data. codebook is, however, also useful interactively, since you can specify one or a few variables:

. codebook tempjan, mv

tempjan ------------------------------------------- Average January temperature

type: numeric (float)

range: [2.2,72.6]                     units: .1

unique values: 310                  coded missing: 2 / 956

mean:    35.749

std. dev: 14.1881

percentiles:        10%       25%       50%       75%       90%

20.2      25.1      31.3      47.8      55.1

missing values: tempjuly==. <-> tempjan==.

crc14 Pairwise correlation coefficients

The already-existing correlate command calculates correlation coefficients using casewise deletion: when you request
correlations of variables æɪ, ж2, ∙ ∙ ∙,
%k, any observation for which æɪ, ж2, ∙ ∙ ∙, %k are missing is not used. Thus, if x3 and
.r
have no missing values, but .r2 is missing for half the data, the correlation between x3 and x↑ is calculated using only the
half of the data for which
x2 is not missing. Of course, you can obtain the correlation between x3 and x↑ using all the data by
typing ‘correlate .rɜ Ж4’.

The new pwcorr command makes obtaining such pairwise correlation coefficients easier:

pwcorr Vvaιistt [wegght [if exp [in range [, obs sig print(#) star(#) bonferroni sidak ]
pwcorr calculates all the pairwise correlation coefficients between the variables in
varlist or, if varlist is not specified, all the
variables in the data.

Options

obs adds a line to each row of the matrix reporting the number of observations used in calculating the correlation coefficient.
sig adds a line to each row of the matrix reporting the significance level of each correlation coefficient.



More intriguing information

1. The name is absent
2. The Values and Character Dispositions of 14-16 Year Olds in the Hodge Hill Constituency
3. DIVERSITY OF RURAL PLACES - TEXAS
4. Robust Econometrics
5. The name is absent
6. Imputing Dairy Producers' Quota Discount Rate Using the Individual Export Milk Program in Quebec
7. The name is absent
8. Mergers under endogenous minimum quality standard: a note
9. AN EMPIRICAL INVESTIGATION OF THE PRODUCTION EFFECTS OF ADOPTING GM SEED TECHNOLOGY: THE CASE OF FARMERS IN ARGENTINA
10. Beyond Networks? A brief response to ‘Which networks matter in education governance?’
11. An Empirical Analysis of the Curvature Factor of the Term Structure of Interest Rates
12. Antidote Stocking at Hospitals in North Palestine
13. The InnoRegio-program: a new way to promote regional innovation networks - empirical results of the complementary research -
14. The Variable-Rate Decision for Multiple Inputs with Multiple Management Zones
15. Deletion of a mycobacterial gene encoding a reductase leads to an altered cell wall containing β-oxo-mycolic acid analogues, and the accumulation of long-chain ketones related to mycolic acids
16. Self-Help Groups and Income Generation in the Informal Settlements of Nairobi
17. Ein pragmatisierter Kalkul des naturlichen Schlieβens nebst Metatheorie
18. Kharaj and land proprietary right in the sixteenth century: An example of law and economics
19. The Impact of Cognitive versus Affective Aspects on Consumer Usage of Financial Service Delivery Channels
20. Measuring Semantic Similarity by Latent Relational Analysis