Stata Technical Bulletin
35
snp15.1
Update to somersd
Roger Newson, Guy’s, King’s and St Thomas’ School of Medicine, London, UK, [email protected]
Abstract: somersd calculates confidence intervals for rank-order statistics. It has been improved, streamlined, debugged, and
intensively certified.
Keywords: Somers’ D, Kendall’s tau, rank correlation, confidence intervals, nonparametric methods.
Syntax
somersd Vaariip Weighht∖ [if exp [in range [, clusterVααrαmne) level (#) taua tdist
transf (transformation-name) cimatrix(aewmatrix) ]
where transformationагате is one of
iden I z I asin ∣ rho ∣ zrho
fweights, iweights, and pweights are allowed.
New options
cimatrix(aew_mrtrix) specifies an output matrix to be created, containing estimates and confidence limits for the untransformed
Somers’ D, Kendall’s τa or Greiner’s p parameters. If transf () is specified, then the confidence limits will be asymmetric
and based on symmetric confidence limits for the transformed parameters. This option (like level) may be used in replay
mode as well as in non-replay mode.
New saved results
somersd now saves additionally the name of the program called by predict in the macro e(predict).
Remarks
somersd was introduced in Newson (2000). The program calculates confidence intervals for the rank order statistics Somers’
D and Kendall’s τa for the first variable of aarlist as a predictor of each of the other variables in aarlist, with estimates and
jackknife covariances saved as estimation results. The new version contains the following improvements:
1. The new option cimatrix has been added (mostly for programmers).
2. The program somers_p has been added as the predict program for somersd, and it warns the user that predict should
not be used after somersd.
3. somersd has been streamlined. If cluster () is not specified, then processing time is now quadratically dependent on the
number of distinct value combinations in varlist, instead of being quadratically dependent on the number of observations
as before. This makes a vast difference to the time taken to process discrete variables in data sets with thousands of
observations.
4. A bug has been corrected, which formerly caused incorrect output when the taua option was used with unequal fweights.
(This bug was not present in the earlier version of somersd circulated via the Ideas list, and there was no excuse for me
to allow it to creep in when upgrading somersd for the STB.)
5. The certification script used to certify somersd is now much more comprehensive than before, ruling out the above bug and
a large range of others. (See the online help cscript.) Amongst other checks, it checks its jackknife confidence intervals
for Kendall’s τa with those produced by ktau and jknife (Gould 1995). The latter programs produce the same confidence
limits as somersd, taua tdist in the most simple case, without weights, clustering or transformations. However, ktau
and jknife take much longer, requiring a time cubically dependent on the number of observations.
Acknowledgments
I would like to thank Bill Gould of Stata Corporation for suggesting the somers_p program, and Bill Gould and Ken Higbee
of Stata Corporation for a great deal of very helpful advice on designing certification scripts.
References
Gould, W. 1995. sg34: Jackknife estimation. Stata Technical Bulletin 24: 25-29. Reprinted in Stata Technical Bulletin Reprints, vol. 4, pp. 165-170.
Newson, R. 2000. snp15: somersd—Confidence limits for nonparametric statistics and their differences. Stata Technical Bulletin 55: 47-55.