Estimating optimal sample size for genetic differentiation
using analytical and bootstrap techniques
Juan F. Fernandez-M.
Department of Biology
University of Missouri, St. Louis
St. Louis, MO 63121-4499
email: S997022@admiral.umsl.edu
Precision in the analysis of the distribution of genetic diversity
and estimation of gene flow rates among populations, is constrained by
the sampling design i.e., number of populations and number of individuals
per population. Few attempts have been made to analytically determine
a sample size large enough that will yield statistically significant estimates
of genetic differentiation or gene flow e.g., . Here, I use the methods
proposed by to estimate the optimal sample size per population when
the total sample size is held constant based on the premise of minimizing
the variance of Gst from known genetic data. Allozyme genetic data from
five loci (AAT-2, AAT-3, DIA-1, DIA-2, and MNR-2) from Sassafras
albidum (Lauraceae) from 36 subpopulations from the Missouri Ozarks,
was analyzed using the program HaploDiv (Petit 1995). Although the program
is intended for haploid data , it yields close results to a diploid procedure
if the species is outcrossed (Petit, pers. comm.). The original sample
sizes were between 24 and 48 individuals per population.
Only the loci that showed a significant genetic differentiation
(MNR-2 Gst = 0.3389, and DIA-2 Gst = 0.0991) were useful in estimating
the optimal sample size. The results indicate that 4 diploid individuals
for the MNR-2 locus, and 9 for the DIA-2 locus per population are enough
to detect population differentiation at those loci.
For the low differentiated loci (AAT-2, AAT-3 and DIA-1) a resample
analysis was performed simulating the 36 populations with a constant sample
size (n = 10, 20, ...100) per population to estimate empirically when the
bootstrap 95% confidence interval on Gst values approached the observed
value for the total data. The simulated samplings suggested: 1) that at
least 20 individuals per population are required for a 95% confidence interval
to overalap with the true mean Gst ; 2) that the estimated variance stabilizes
when sample size is greater than 30 individuals per population; and 3)
that the estimator that approaches the true value the better is the Gst
estimator proposed by Pons and Chaouche (1995). For a locus by locus
analysis, the least differntiated locus will determine the minimum sample
requirements.
Literature Cited
-
Assuncao, R. and C. M. Jacobi. 1996. Optimal sampling design for studies
of gene flow from a point using marker genes or marked individuals. Evolution
50(2): 918-923.
-
Epperson, B. K. and T. Li. 1996. Measurement of genetic structure within
populations using Moran's spatial autocorrelation statistics. Proc. Natl.
Acad. Sci. USA 93: 10528-10532.
-
Petit, R.J. 1995. HaploDiv. A Pascal program for the Analysis of diversity
for haploid data.
-
Pons, O. and K. Chaouche. 1995. Estimation, variance and optimal sampling
of gene diversity II. Diploid locus. Theor. Appl. Genet. 91: 122-130.
-
Pons, O. and R. J. Petit. 1995. Estimation, variance and optimal sampling
of gene diversity I. Haploid locus. Theor. Appl. Genetics 90: 462-470.