Computing optimal sampling designs for two-stage studies



Stata Technical Bulletin

STB-58


Example 1

We simulate 1000 two-generation families. The mean age and standard deviation of people in the first- and second-generation
are 70, 10 and 40, 10, respectively. The frequency of an allele A is assumed to be 0.05. The mean number of siblings in the
second generation is 5. The simulated family data are saved into the file output.dta.

. simuped2 70 10 40 10, reps(1000) sav(output) alle(0.05) sib(5)

. use output

. describe

Contains data from output.dta

obs:         6,818

vars:             6                           23 Oct 2000 07:51

size:       177,268 (83.0% of memory free)

1. famid

2. id

3. degree

4. female

5. age

6. genotype

float
float
float
float
float
str2

7.9.0g

7.9.0g

7.9.0g

7.9.0g

7.9.0g

7.9 s

Sorted by:

. list

famid

id

degree

female

age

genotype

1. 1

1

1

0

73

aa

2.           1

2

1

1

76

aa

3.           1

3

2

0

28

aa

4.           1

4

2

1

43

aa

5.           1

ε

2

1

26

aa

6.          2

ι

1

0

64

aa

7.          2

2

1

1

εs

aa

8.          2

3

2

0

38

aa

9.          2

4

2

1

46

aa

10.          2

(output omitted )

ε

2

0

ει

aa

6809.       999

2

1

1

ει

aa

6810.       999

3

2

0

εo

aa

6811.       999

4

2

1

38

aa

6812.       999

ε

2

0

37

aa

6813.       999

6

2

1

41

aa

6814.      1000

ι

1

0

74

aa

6816.      1000

2

1

1

70

aa

6816.      1000

3

2

1

38

aa

6817.      1000

4

2

1

41

aa

6818.      1000

ε

2

1

38

aa

A total of 6,818 individuals are generated in the 1,000 families. The variable famid represents the family identification of the
simulated family, while id represents the personal identification within each family, degree represents the generation a person
belongs to, female is one or zero depending on whether or not a person is a female, age represents the simulated age, and
genotype represents a person’s genotype.

Example 2

We simulate 2,000 three-generation families. The mean age of people in the first-, second- and third-generation are 80,
50 and 20, respectively. Their standard deviation is assumed to be 10 across all generations. The frequency of an allele A is
assumed to be 0.1. The mean number of siblings in the second- and third-generation are 4 and 3.5, respectively.

. set memory 50m

. simuped3 80 10 50 10 20 10, reps(2000) allé(0.1) sib(4) si3(3.5)
. use temp
. describe

Contains data from temp.dta
obs:        29,667

vars:             8                           23 Oct 2000 18:38

size:     1,008,678 (98.1% of memory free)

1. famid     float  %9.0g

2. id        float  %9.0g

3. degree    float  %9.0g



More intriguing information

1. An Investigation of transience upon mothers of primary-aged children and their school
2. The name is absent
3. The name is absent
4. FASTER TRAINING IN NONLINEAR ICA USING MISEP
5. The name is absent
6. Innovation in commercialization of pelagic fish: the example of "Srdela Snack" Franchise
7. Behavior-Based Early Language Development on a Humanoid Robot
8. The name is absent
9. The name is absent
10. The name is absent
11. Benchmarking Regional Innovation: A Comparison of Bavaria, Northern Ireland and the Republic of Ireland
12. Incorporating global skills within UK higher education of engineers
13. Optimal Private and Public Harvesting under Spatial and Temporal Interdependence
14. The name is absent
15. Concerns for Equity and the Optimal Co-Payments for Publicly Provided Health Care
16. The name is absent
17. BUSINESS SUCCESS: WHAT FACTORS REALLY MATTER?
18. The name is absent
19. Fiscal Sustainability Across Government Tiers
20. The name is absent