Organizer: Victoria L. Sork
Participants: W. Thomas Adams, Diane Campbell, Frank Davis, Rodney Dyer, Juan Fernandez, Michael Gilpin, James Hamrick, John Nason, Joseph Neigel, Rémy Petit, Outi Savolainen, Peter Smouse, Eleanor Steinberg
Abstract. A workshop was held at the National Center for Ecological
Analysis and Synthesis to discuss gene flow on an ecological, rather than
an evolutionary, time scale. Recently, ecologists, conservation biologists,
and ecosystem managers have been interested in monitoring on-going gene
flow to understand environmental and landscape influences on genetic variation
in existing populations. Our first goal was to review current approaches
to the estimation of gene flow and their usefulness for measuring on-going
gene flow. Our overwhelming conclusion was that indirect methods based
on F-statistics are not sufficiently sensitive to measure gene flow
on this scale. Instead, direct methods of genealogical analysis offer a
reliable alternative at a small scale but may have more limited utility
for scaling up. Because gene flow occurs on a landscape scale, we explored
the usefulness of current population genetic approaches for scaling up
our estimates and we discussed the potential contribution of metapopulation
and landscape models. We evaluated the relationships between population
genetic and metapopulation models, but concluded a new synthesis integrating
the two approaches is not yet ready for development. However, workshop
participants explored in detail a new approach to the study of gene flow
that would be feasible at the landscape scale and might generate a parameter
of migration needed for metapopulation and landscape models. Two approaches
are discussed by John Nason and Peter Smouse in Part II of this report.
Contents:
Part I: Current approaches: Gene flow on ecological time scales (Summary of workshop discussions)
Indirect
methods using F-statistics
Box A. Genetic structure of subdivided
populations.
Direct
methods using parentage analysis
Box B. Neighborhood size of continuous
populations
Box C. Guidelines for the use of models to estimate pollen-mediated
immigration.
Table 1. Sample sizes needed for direct
estimation of gene flow given 80% and 90% paternity exclusion.
Shortcomings of direct methods
Metapopulation and landscape approaches to gene flow
Part II. Two essays on new approaches.
A. Scaling up: Enlarging the spatial scale of parentage analysis (John Nason)
B. Thoughts on a genetic structure-like approach to pollen flow (Peter Smouse)
Literature Cited for Parts I and II
Part III. Abstracts of workshop presentations
Appendix A. Gene-flow related
software available through NCEAS website
Part I. Current approaches: Gene flow on ecological time scales
The study of gene flow (i.e. movement of genes among populations) has been a vital topic in evolutionary biology. Most theoretical models of gene flow stem from concepts developed by Sewall Wright that are based either on continuous populations, using an isolation by distance approach (Wright 1943, 1946), or on populations as islands that become differentiated through mutation and genetic drift. The island model assumes equilibrium conditions, gene flow among all populations, and populations of equal size. Recently, ecologists, conservation biologists, forest managers, and ecosystem managers have become interested in gene flow on an ecological time scale (Sork abstract. Part III in these proceedings). Using biochemical or molecular genetic markers, many of these scientists have borrowed the genetic structure approach to estimate gene flow (Nem). Yet, both the time scale and the spatial scale of these studies violate the assumptions of gene flow models based on F-statistic or other genetic structure approaches.
An alternative way to estimate gene flow is to use parentage analysis which can identify parents (usually fathers) and then quantify the pattern of gene movement. Meagher (1986) presented an example of paternity analysis in a plant population in order to quantify variance in reproductive success as a function of distance. Subsequent modifications of this basic approach allow this technique to be used for the study of gene movement into populations (Devlin and Elstrand 1990; Roeder et al 1989; Smouse and Meagher 1994). The parental analysis approach provides a direct estimate of gene movement, which is a critical element of gene flow, but it does not yield an estimate of Nem, because it is usually based on one or two reproductive episodes, rather than gene flow over a whole generation.
Many studies of current gene flow, especially those in conservation biology, are aimed at understanding gene movement on a regional or landscape scale. As continuous populations become fragmented, they may assume metapopulation dynamics, through extinction and recolonization events of the different fragments. It is not clear whether recent modeling approaches in metapopulation biology and landscape ecology offer viable insight on gene movement nor whether current measurement of gene movement contribute the migration estimates needed for landscape models.
In this workshop report, we summarize our discussions of gene flow on
an ecological time scale. A major emphasis of the workshop was the application
of gene flow models to tree populations, although some participants work
with other types of organisms. Most applications of gene flow models have
been primarily in small populations (natural or managed) or within stands
of larger populations. Most work has emphasized pollen dispersal dynamics
within stand and the proportion of outside pollen into stands. However,
little work to date has examined gene flow dynamics among stands. The specific
objectives were: (1) to review indirect and direct methods of estimating
gene flow; (2) to review available statistical models for estimating gene
flow; and (3) to evaluate the extent to which landscape approaches and
spatially explicit models can be incorporated into gene flow studies. The
workshop included presentations (see abstracts in
Part III of this proceedings), subgroup discussions (included in this
report below), and discussion of software programs (see Appendix
A. Gene flow related Software).
Indirect methods using F-statistics
Historically, the estimation of gene flow has relied on indirect methods or those based on Wright’s parameter of population differentiation FST, (see Box A). In many respects, FST is an ideal parameter that summarizes the evolutionary history of the populations under study, yielding insights about the relative importance of gene flow and genetic drift. Moreover, the relative ease of collecting the requisite data and the facility of analysis make indirect methods a obvious choice for many evolutionary and conservation biology studies. Neigel (1997) summarizes the advantages of using the indirect approach for estimating FST as a parameter to estimate Nem, and also describes recent advances in the analysis of genealogical relationships of genes (coalescent approach) as an alternative method of estimating gene flow (see Neigel abstract, Part III of these proceedings).
Gene flow has often been modeled differently in subdivided and continuous populations. For subdivided populations, the indirect approach of F-statistics, as described above, is usually employed. In contrast, for continuous populations, it might be more common to estimate neighborhood size, based on Wright’s isolation by distance approach (see Box B). This latter approach is not an indirect method.
The indirect approach of using F-statistics or F-statistic-like methods
to estimate gene flow, evolutionary lineages, and population relationships
has made valuable contributions to evolutionary biology (Neigel 1997).
However, this approach can be misapplied to studies on a ecological time
scale (e.g. Steinberg abstract, Part
III of these proceedings; Stenberg and Jordan 1997). The result is that
the literature in conservation biology includes many studies which report
alleged levels of gene flow, based on FST estimates,
that reflect long-term history, not ongoing processes. For the purposes
of answering gene flow questions on an ecological time scale, FST
methods are not advisable, and should be regarded as mere descriptors of
genetic structure, along with other measures of genetic diversity. The
computational robustness of FST is one of its statistical
advantages, but its insensitivity to rare alleles results in an estimate
that ignores on-going dynamics that are directly relevant to the interests
of ecologists, conservation biologists, and ecosystem managers. We do not
discount the utility of genetic structure statistics for conservation or
management objectives. In fact, if one wishes to measure them, recent work
on optimal sample size can provide some useful guidelines on how to maximize
sampling effort (see Fernandez and
Petit abstracts, Part III in these
proceedings). Nonetheless, we conclude that, for the study of ongoing gene
flow, indirect approaches are not a appropriate.
Direct methods using parentage analysis
For the study of gene movement on an ecological time scale, parentage
analysis in the sense of Roeder et al (1989), Adams and Birkes (1991),
Devlin and Ellstrand (1990), and Smouse and Meagher (1994) is currently
the most effective approach (see Adams
and Nason abstracts, Part III of these
proceedings). This form of gene movement is part of the dynamics
of gene flow, however, we caution that this measure cannot be interpreted
as interpopulation gene flow characterized by Nm or
,
the effective number of migrants per generation on an evolutionary time
scale. Moreover, parentage analysis based estimates of gene movement measures
immigration into a circumscribed area which may or may not be an "population".
However, one can use parentage analysis to estimate the distribution of
dispersal distances, sometimes yielding a dispersion curve analogous to
that of Wright’s Isolation by Distance model (see Box
B). One can also use parentage analysis to examine pollen or seed mediated
gene movement. Here we focus on four related models which provide estimates
of pollen-mediated gene movement. The general model of parental analysis
uses progeny from known maternal parents to assign paternity to a set of
potential pollen donors while the power of other models is to estimate
the rate of pollen immigration from outside the experimental population.
Individual paternity. If your objective is to quantify within population patterns of pollen movement and individual male reproductive success (RS, including selfing) then the methods of Roeder et al. (1989; see also Smouse and Meagher 1994) provide the greatest detail. Basically, this approach assumes focal population is isolated from outside pollen sources and that genotypes of all potential males are known. Potential problems with this method are that it can require extensive sampling of progeny per female, and, due to constraints on assayable genetic information, often requires the number of potential pollen donors to be relatively small. Moreover, these methods do not adjust estimates (and variances) of male RS for cryptic gene flow. This adjustment is important, because cryptic gene movement biases estimates of male fertility unevenly for males with low and high RS. Nason (in prep.) is working on a modification of this method to make the adjustment (Nason abstract, Part III of these proceedings). However, even with an adjustment for cryptic gene flow, this approach may underestimate fertility differences among males (Adams 1992a; Adams 1992b). This paternity approach is useful for generating a pollen dispersion curve and for estimating gene movement from outside a circumscribed area (although, as noted below, there are more powerful methods for estimating gene "immigration"). (This approach can be done using PollenGF by Nason (Appendix A) and on NCEAS website or using software available from Devlin).
Neighborhood model. This neighborhood model of Adams and Birkes (1991) groups fathers by distance and fits a dispersal function to the data instead of estimating individual male RS. This approach provides estimates of selfing, the probability of within population dispersal as a function of inter-mate distance, and pollen movement into an experimentally defined population. The neighborhood model is similar to the pollen gene movement model but it differs by not estimating fertilities of individual males within a circumscribed area or neighborhood. Instead, it estimates parameters relating mating success to factors, such as distance, relative pollen fertility, or tree size (e.g., Adams abstract, Part III of these proceedings). The Individual paternity model can also be used to estimate the relationship between mating success to these same parameters by using individual male fertilities. In the individual paternity model, there are no assumptions about fertilities but the model estimates them poorly. The neighborhood model requires applying reasonable models from which estimates of model parameters can be derived. This approach works best for species with populations with evenly distributed individuals but this spatial pattern is not a requirement. The program available is limited to situations where pollen (or egg, for seed dispersal) haplotypes can be determined (possible with embryo-megametophyte systems in conifers or when DNA markers from male-inherited organelles are used). (See program by Adams and Birkes (Appendix A) and NCEAS website).
Pollen gene movement model. This method extends the paternity exclusion approach developed by Devlin and Ellstrand (1990) to estimate both the apparent and cryptic components of total immigration. So far, this approach as been applied to patchily distributed populations (e.g., Ellstrand and Marshall 1985; Hamrick and Schnabel 1986) but see application of this model to northern red oak in a continuous stand (Dyer abstract, Part III of these proceedings). Nason (in prep.) is modifying this model so that it jointly estimates individual fertilities within a circumscribed area and immigration from outside that area. Both this model and the neighborhood model described below can be done using artificially circumscribed populations within larger continuous populations (e.g. Dyer abstract, Part III of these proceedings or within isolated population patches (Nason and Hamrick 1997). (See PollenGF by Nason (Appendix A) and on NCEAS website for these proceedings.)
Multiple population gene movement model. Another modification
of the parentage approach is developed by Kaufman, Smouse, and Alvarez-Buylla
(Kaufman et al. 1998; see also Smouse et
al. abstract, Part III of these proceedings). Unlike the neighborhood
and pollen gene flow models described above, in which pollen migration
into study population is assumed to have a single source, it implements
more source populations. The current version is restricted to plant populations
where all known source populations can be identified and sampled.
Any of the four models could be modified to seed-mediated gene movement, although such estimates can be more difficult to obtain. Estimating seed movement with molecular markers is hindered by the small rate of mutation in cpDNA that produces very little intrapopulation variation. Yet, cytoplasmic markers should not be dismissed because they can provide valuable information about pollen and seed-mediated gene movement (see Petit abstract, Part III of these proceedings). Indeed, it has been found that some species (soy, rice and a few wild ones) contain hypervariable ssr sequences that are very promising for seed flow studies (e.g., McCaulley 1994; McCauley 1995b). For conservation-motivated research, the extension of these models to seed-mediated movement may be essential for the estimation of colonization probabilities based on genetic markers, and that task lies ahead of us.
The choice of any of the four methods above is determined by the question. If we want variance in male fertilities within an area, as well as gene movement, then we need to use either the individual paternity or the pollen gene flow models. Both will require high exclusion probabilities and a large number of progeny per mother. Alternatively, if we are interested in gene movement into an area, then lower exclusion probabilities and sample sizes may be adequate. The last three approaches can accomplish this estimation, although the neighborhood model can only be used for gymnosperms. By reducing the exclusion probability and sample sizes per mother, one could sample more sites. (See Box C for optimal sampling strategy and Table 1 for optimal sample sizes to estimate gene flow events.)
The use of parentage models to evaluate pollen-mediated gene flow is often quite effective at demonstrating the consequences of pollination. However, this approach can be complemented effectively with directly measured ecological data such as pollinator behavior or seedling establishment. In some cases, pollinator behavior may be easier to study and equally informative about the nature of pollen-mediated gene flow (Campbell abstract, Part III of these proceedings).
In conclusion, we recommend the use of genealogically-based direct estimates for small scale measurement of local gene movement. While this approach has limitations (see next section), numerous studies have already utilized this approach to study gene flow in fragmented stands (Ellstrand 1992; Ellstrand and Marshall 1985; Hamrick 1992; Hamrick et al 1995) and, to a lesser extent, continuous populations (Adams and Birkes 1991; Friedman and Adams 1985; Dyer and Sork, in prep.). The choice of any of the four methods above should be determined by your question. If you want variance in male fertilities as well as gene movement within an area than you need to use the individual fertility or pollen gene flow approach. But in both cases, you will need high exclusion probabilities and large number of progeny per mother. If you are more interested in gene movement into an area, then the last three models will all be appropriate. In this case, then lower exclusion probabilities and sample sizes may be adequate. These changes in sampling strategy would permit sampling of more site.
Shortcomings of Direct Methods
The study of fine scale gene flow and relative male fertility is best accomplished by the use of parentage type analyses. Genetic markers currently have enough resolution and power to model fine-scale gene movement with some precision. However, a major weakness in parentage analyses is that they tell us relatively little about the nature of unassigned paternity (i.e. the source of pollen outside a circumscribed area). This unassigned paternity could come from 10 m outside the area or 1000 m. If the study of gene flow is to expand to involve longer distance movement of genes between populations or to address patterns of gene flow across increasingly larger spatial scales, it is essential to identify the particular limitations inherent in parentage analysis experiments and to suggest modifications that will allow a successful scaling up of our questions.
First, the emphasis of paternity analyses is steadily shifting away from a strict assignment of paternity and toward answering questions concerning the factors that might be contributing to the levels of apparent gene flow. For many plant populations, rates of gene flow are much higher than had been predicted, and confusion immediately arises when attempting to determine the patterns of long-distance gene flow. For example, should the scale of the paternity analysis simply be extended to include more putative fathers? If so, then increasing the scale of the paternity analysis will bring about a concomitant increase in the labor involved and a loss of genetic resolution. Moreover, the effort in identifying, sampling, genotyping, and mapping the positions of all putative fathers in the study plot may be prohibitive for most research projects.
Secondly, a distinction must be made between the study of gene flow, via paternity analysis, in fragmented populations and continuous populations. Logistically, fragmented populations are easier to handle, because of the smaller number of potential fathers in the immediate vicinity. Even if gene flow occurs over great geographical distances, a fragmented landscape will include fewer potential fathers than a continuous landscape. However, when identifying the number of differences in the pollination syndrome, fragmentation structure, background environmental matrix, and a multitude of potentially confounding environmental variables between species, it becomes natural to ask whether studies confined to fragmented habitats are applicable to species with continuous distributions.
Finally, parentage analysis of gene movement is restricted in both temporal
and spatial scale. In most cases, paternity analyses are conducted on a
limited number of maternal trees, for one or two years and in a single
geographic site. Estimates of gene flow based on these studies have little
replication to evaluate their variance. Year to year variation in pollen
production or reception and specific geographic or maternal idiosyncrasies
preclude the formation of widely general patterns from a single paternity
analysis (see Hamrick abstract, Part
III of these proceedings. Thus, eventually it may be necessary to shift
away from paternity analyses for questions that involve larger spatial
and temporal scales.
Gene flow and adaptation
Workshop discussions focused largely on gene flow alone, with little regard to the importance of locally adapted genotypes. However, it is clear that gene flow among some populations could result in reductions in progeny fitness (Savolainen abstract, Part III of these proceedings). Genetic surveys that are designed to estimate gene flow could also be used to examine the consequences of gene flow for conservation and management purposes (for discussion of optimal sampling for surveys, see Petit abstract, Part III of these proceedings). Indeed, such surveys are meant to identify diploid immigrants (seed flow), haploid immigrants (pollen flow), within-population outcrossed progenies, and selfed progenies. An evaluation of the relative fitness of these different classes of progeny would increase our understanding of the consequences for the viability and adaptability of recipient populations. Numerous studies have demonstrated reductions in the relative fitness of selfed versus outcrossed progeny, particularly in predominantly outcrossing species. Habitat modification associated with human activities has, in some cases, been correlated with increased rates of selfing, though effects on progeny fitness have not been examined in this context.
Gene flow is considered an important force for the maintenance of genetic diversity. In addition, high amounts of gene flow will reduce inbreeding. However, gene flow also has the potential to introduce poorly adapted genes (outbreeding depression) that can reduce viability of the population. While it is not clear how likely increased gene flow will result in outbreeding depression, the possibility illustrates the connection between gene flow and local adaptation. Populations that now occupy altered landscapes are likely to experience different patterns of future gene flow than those experienced over a longer period in the past (Savolainen abstract, Part III of these proceedings). If ecological conditions are changing (e.g., global change), it could introduce genes adapted to the new conditions (e.g., for Scots pine in Finland, genes from the southern part of the country may play well to climatic warming in the north).
Finally, if the regional population system functions as a metapopulation, with frequent local extinction and recolonization, the system as a whole will only persist if colonization of new patches by seeds occurs with sufficient probability. We conclude that an awareness of the fitness consequences of gene flow should be a prominent feature of future gene flow studies.
Metapopulation and landscape approaches to gene flow
The models of the infinite island gene flow, metapopulation, and landscape
ecology appear like they should be quite compatible and complementary (see
Fig. 1). All three perspectives are interested in movement between populations.
However, the assumptions of infinite island models that estimate Nem
are quite different from those of Levins’ (1970)classical metapopulation
model based on extinction and colonization dynamics. We are starting to
find landscape modeling approaches applied to genetic questions. For example,
Antonovics et al. (1977) have been developed a spatially explicit version
of the metapopulation approach. In some cases, incorporation
of metapopulation models can provide new insight about frequency of specific
traits such as self-incompatiblity alleles in plant populations (e.g.,
Gilpin abstract, part III). The
advantage of the metapopulation and landscape approaches is that they can
operate on the landscape scale (see Review in McCauley 1995a). Unfortunately,
the gap between genetic migration studies and metapopulation migration
studies is large (Antonovics 1997). Yet, a synthesis of genetic and demographic
approaches should be mutually beneficially because population genetics
and population ecology require estimates of migration (see Hanski and Simberloff
1997; Hanski and Gilpin 1991). Here, we focus on existing models that might
be relevant to genetic studies.

Few models are available that explicitly analyze gene flow within metapopulation or landscape perspectives, and there are virtually no general models for gene flow. However, there are different types of spatially explicit models that have potential applicability to gene flow studies (Davis abstract, Part III of these proceedings). One example of such a spatially-explicit model is Steinberg and Jordan’s (1997) individual-based modeling approach (see Steinberg abstract in Part III of these proceedings). Their approach to connecting demography and genetics (‘virtual pocket gophers’) could easily be adapted to include spatial or temporal heterogeneity. Alternatively, object-oriented models would be amenable to layering landscape, demographic, and genetic processes (Davis abstract, Part III of these proceedings). The first category consists of biological transport models, individually-based / cellular automata models (i.e. Ecobeaker, by e. Meir) and metapopulation models (e.g, RAMAS-GIS, ALEX, Lindenmayer et al. 1995). A second category consists of physical transport models (i.e. FETCHR). The utility of any of these models for describing gene flow processes has not received much attention (but see Antonovics 1997; Gilpin 1991; McCauley 1995a) .
An unresolved question is whether spatially-explicit modeling offers any benefits to population geneticists. We suggest that this approach could have useful applications for some situations. For example, understanding pollen flow patterns via wind transport vectors ( i.e. wind channels, etc.) would provide means for hypothesis testing about influences of landscape changes. The use of spatially explicit mapping offers a means of mapping different selection regimes (i.e. soil types, elevation, etc.). Finally, the measurement of gene flow within a landscape mosaic allows one to measure ‘ecological distance’ between populations, as well as direct physical distance, perhaps having divergent implications for gene flow. In this case, the combination of spatially explicit genetic data combined environmental data are available for the same landscape would allow one to test several hypotheses about the impact of "ecological distances" on gene flow or the influence of environmental variables on gene flow.
From a landscape modeling perspective, migration is important when considering the contribution of genetics to conservation and management. Integration of genetic and demographic data, or interpretation of either genetic or demographic processes, each with respect to the other, require the ability to translate the movement of genes (gene flow) to the migration of individuals (or of pollen/seeds) and vice versa. To make this translation (i.e., via simulations), it would be useful to have information on distributions of dispersal or gene flow distances, rather than average (i.e. Nem) estimates. So far, the type of migration parameter that is needed to connect genetic and demographic models are not being measured.
From the perspective of metapopulation or landscape models of plant populations, seed dispersal data are more important than pollen dispersal data. While seed and pollen movement can be quite different and influence genetic structure differentially, for population demographic processes (i.e. colonization), seed dispersal, or dispersal of vegetative propagules for many species, is the key. Use of maternally-inherited markers (e.g. Demesure et al. 1996; Dumolin et al. 1995; McCauley 1994; McCauley et al. 1995), and paternally inherited markers, in conjunction with nuclear markers, would allow examination of both seed and pollen dispersal.
A key issue for many ecological, conservation, and management studies is the adoption of a proper landscape scale. It would be useful to have genetic models that integrate both spatial variability (i.e., heterogeneous landscapes) and temporal variability (i.e., metapopulation dynamics), both to examine how these types of variation influence the genetic structure of populations, as well as to consider how these types of variation influence our interpretations of genetic structure. The application of landscape models necessitates larger scales of study. Obviously, this will often be logistically difficult. Large-scale studies will be most tractable in small isolated populations, such as Kaufman et al’s Cecropia study (Kaufman et al., 1998; see also Smouse et al. abstract, Part III of these proceedings) or tropical trees in fragments (e.g. Nason 1997; Stacy et al. 1996) or populations following a river course (linear population arrays). The scaling up of genetic studies might require careful selection of study systems in order to measure parameters that can then be modeled. Another approach to asking landscape-scale questions (i.e., regarding long-distance gene flow) would be to focus on the edges of species ranges, where populations are smaller and more fragmented, permitting examination of associations between distance/size of fragments and gene flow patterns. But, this approach may give biased picture, relative to more centrally located populations.
Finally, we want to emphasize that most currently fragmented populations of interest to the conservation biologist were probably not metapopulations over extended evolutionary time. Landscape alteration has created metapopulations out of formerly continuous populations. In most cases, extinction-recolonization dynamics have only recently been imposed through habitat loss and fragmentation. Temporal scale is thus an important consideration in genetic applications of metapopulation models. Because (a) we do not know what kind of metapopulation has been created (i.e., disequilibrated, patchy, classical?), and (b) we do not know where the metapopulation is headed, methods must be sensitive to recent shifts in gene flow patterns. We conclude that standard indirect methods may not be sufficiently sensitive to estimate recent changes in gene flow.
**Return to start of proceedings
Part II. TWO ESSAYS ON NEW APPROACHES
Scaling-up: Enlarging the spatial scale of parentage analysis
by John Nason
In many cases the spatial and temporal dimensions at which gene movement can be effectively investigated fails to encompass the scale of interest. Indirect methods of estimating the effective number of migrants per generation (Nm) from measures of variation in gene frequencies (e.g., FST) can be utilized over a broad range of spatial scales but reflect the cumulative effect of migration over an evolutionary time scale. Direct, parentage analysis based methods, in contrast, estimate contemporary rates of gene movement but have been limited to relatively modest spatial scales in their application. Given ecological, evolutionary, and management oriented interests in current patterns of gene movement within and among populations, it is of interest to consider whether and how parentage analysis methods can be extended to investigate dispersal processes occurring over larger spatial scales.
Due largely to methodological factors, available analytical models have not been used to their maximum capabilities to resolve long distance pollen dispersal events. Many estimates of the rate of effective pollen immigration into experimental populations have come from experiments specifically designed to examine individual male reproductive success and its ecological correlates. The power of state of the art paternity analysis models (Roeder et al. 1989; Smouse and Meagher 1994) to provide detailed information on relative male fertilities decreases as the spatial scale of the experimental population and the number of potential pollen donors increases. Moreover, since these models assume the absence of cryptic pollen immigration (pollen gametes with genotypes indistinguishable from ones that could be produced within the population) they have been applied primarily to relatively small, spatially isolated populations. As a result, experimental designs optimized for paternity analysis have often been somewhat unnatural and generally sub optimal for quantifying the tail of the effective pollen dispersal distribution.
One means of increasing the spatial scale of parentage analysis is to decouple studies of pollen immigration from paternity analysis. Extending the spatial scale is limited only by our ability to detect apparent immigration events given available levels of assayable genetic variation. Given that rates of apparent pollen immigration into experimentally defined populations have often been relatively high, pollen gene movement could be quantified over larger spatial scales by successively enlarging the size (e.g., radius) of these populations until apparent pollen gametes could no longer be detected. Importantly, the major assumption of exclusion based methods of estimating total pollen immigration from the observed frequency of apparent immigration events (i.e. Devlin and Ellstrand 1990) is that the genotypes of immigrant pollen gametes can be modeled as being drawn at random from a large source population of known frequency. As a result, these estimators are not limited in the types of population structures, continuous or discontinuous, to which they can be applied.
Other opportunities for enlarging scale involve utilizing certain population
configurations and species with specialized forms of correlated mating.
Population structures that are naturally patchily distributed or linear
(e.g., riparian gallery forest), for example, increase the probability
of detecting genetically apparent immigration events by decreasing the
density and number of within population sources. The most powerful method
of increasing the spatial scale of parentage analysis, however, is to utilize
species that produce singly-sired fruit. By permitting very precise reconstruction
of paternal genotypes from full-sib progeny arrays, as opposed to the inference
of microgametic genotypes from individual seeds, this form of correlated
mating greatly increases the probability that immigration events will be
apparent and thus detectable over a larger spatial scale (e.g., Nason 1997).
Although the routine production of singly-sired fruit is limited to only
a few plant taxa (the Asclepiadaceae, Mimosoid legumes, the genus Ficus,
and the Orchidaceae) these groups are, fortunately, relatively speciose.
Thoughts on a Genetic Structure-like approach to pollen flow
by Peter Smouse
Introduction
It is important to have some sense of how we arrived at this point, so let me begin by reminding us that we initiated the use of parentage analysis in the hope that if we could identify male parentage, we could say something useful about the distribution of male fitness in natural populations. The growing realization that we were going to have to deal with pollen flow from outside the immediate population, initially viewed as an aggravating complication, has now developed into a deeper appreciation of the fact that much of the pollen for a circumscribed area is coming from somewhere else.
Our initial attempts to model the incoming pollen as drawn from the surrounding (genetically homogeneous) area is now giving way to the thought (and some results) suggesting that the 'out-population' pollen may be coming from genetically heterogeneous sources. In many interesting cases, we have no hope of characterizing a much larger panoply of specific males who might provide that incoming pollen, and that even our ability to represent them by a sample of males has serious limitations. We are simultaneously concerned that long distance gene flow cannot be measured directly by anything we can do.
What it comes down to is that if we are now going to treat pollen flow as a measure of inter-population gene flow, we are going to have to change our approach. We have a number of problems and contrasts that need attention, and the resources available will prohibit simply expanding the size of a paternity analysis. For example, we need to know:
(2) whether the incoming pollen cloud, representing (in many cases) a substantial portion of the total male parentage for a localized population, is genetically homogeneous, or whether pollen from different sources or directions or distances is genetically different;
(3) whether the gametic input from males in one year is the same as that for another year, or whether 'it all comes out in the wash,' over the reproductive cycle of an organism that reproduces over many years.
Toward A Pollen Structure Design
Instead of worrying about which individuals are 'inside' and which are 'outside' the population or the neighborhood, or whether we can even define discrete populations in any meaningful way, we choose to center the design on single females, spaced and clustered in ways that might be appropriate for the study or contrast in question. The basic idea is to compare the gametic pollen profiles extracted from different females to learn something about the heterogeneity of pollen donor pools they have sampled. As Jim Hamrick has put it, each female can be viewed (in essence) as a separate biological pollen trap, spaced out in some convenient pattern. Just to get ourselves started, consider the following quartet of situations, each embedded within a larger-scale distribution of the species, which latter is inevitably somewhat poorly characterized, and not really amenable to exhaustive enumeration over any very large spatial scale:



The sampled females are indicated by x's in the diagram below. We want some close spacing (at distances within the scope of a single patch), some intermediate-scale spacing (as for single trees in a dispersed population), and a total separation (say, NW to SE or SW to NE) that should pick up pollen profiles that are different.
From each female, we will extract n seed. For the sake of definiteness (but with the option to adjust the sample sizes, on further reflection), let us assume we have n = 50 from each female. From each seed, we desire the male and female gametic genotypes. Leaving aside the general difficulty of doing that for the moment, assume we have n = 50 female gametes, with n = 50 paired male gametes. The idea is to use these gametic genotypes to say something about divergence among gametes, given homogeneous (or heterogeneous) pollen draws for different females, due to: (a) patch differences, (b) distance considerations, (c) or whatever else in an ecological context that is interesting to look at and that is differential for the sampled females.
Genetic Markers / Distance Metrics
In general, our intent here is neither to determine the male parent of particular seeds, nor even to obtain strong likelihood separation, but rather to determine whether pollen profiles of different females are different. For the sake of initial discussion, consider a battery of H polymorphic allozyme loci. We could have two alleles each, but we will generally have more than two alleles for each locus. Consider the 4-allele case, which is sufficient to describe the scoring convention. Since we have haploid assay, we have an equilateral tetrahedron (a perfect pyramid), with each vertex representing an allele and each edge the distance between a particular pair of alleles. For unweighted analysis, we assume that each edge is of unit length, so that the 'squared distance' between any unlike pair of alleles is one (1). The schematic below should suffice to illustrate the point (Peakall et al. 1995):
The squared genetic distance between any pair of male gametes whether from the same female or different females is either 1 (different) or 0 (the same) for this locus. If we want to worry about weights for different alleles, it is possible to devise inverse-frequency weights, taking values 1/p, 1/q, 1/r, 1/s (for 4 alleles), and so on, but experience suggests that such nuances won't help much in practice. The scheme extends to as many alleles as we might have. The strategy for the multi-locus treatment is simply to add the squared genetic distance for each locus. We have a separate N x N matrix of pairwise distances for each locus. The multi-locus matrix is simply the element by element summation of the separate matrices. We will have at least an H-dimensional representation.
Recently, attention has turned to microsatellites, as they have larger numbers of alleles, so the multi-allele extension is valuable. With microsatellites, we can also measure along the 'copy number' axis, using the sort of RST measures recommended by Slatkin et al. and Feldman et al. , reminiscent of the analogous 'ladder measures' of Richardson and Smouse . The important point is that we want the squared distance for everything we do. Again, for the multiple-locus distances, we simply sum the squared distances for each locus, for each of the N(N-1)/2 pairs of individuals. It is probably worth a comment here that some microsatellite loci are so highly variable that rare alleles are not uncommonly new mutants; we want to avoid that sort of complication, so it will be necessary, though easy enough to do, to choose among microsatellite loci for those that are ‘well behaved’.
We also need to consider maternally inherited (mtDNA) and paternally inherited (cpDNA in conifers) markers, wherever we can get them, not as a replacement for the nuclear markers, but as a useful adjunct. That leads us to NST-type measures, where each 'locus' is separately coded, but where there really is no recombination. In that case, we want the number of substitutions between two multiple-locus 'haplotypes', either measured phenetically or phylogenetically. All of these measures and types of genetic data can be covered with FST methodology .
Partitioning the Variation
We now have an N x N matrix of inter-individual squared genetic distances. We can use AMOVA, Mantel, and other multivariate matrix methods to partition variation among various components of the total haplotypic divergence, search for spatially arrayed pattern, and we have approximately N = 2000 gametes (20 females x 50 seeds; 1000 male and 1000 female gametes) with which to work. We now have: (a) paired male-female multiple-locus haplotypes, which might or might not be correlated, (b) enough information on each female to assess whether we have mendelian segregation, (c) a gametic spectrum from the males that will be either less or more distributed than that from a given female, (d) a separate male spectrum from each female that will (generally) be over-dispersed, relative to the neutral expectation from a homogeneous male population gamete pool, (e) enough spatial spread and coverage in the females to assess the impact (if any) of physical separation on differential male reproductive contributions.
All we have to do, in principle, is partition the variation among components. We could do a separate partition among the male gametes of different females, among the female gametes of different females, within and among male-female pairs of gametes for a single female, and so on. With standard partitioning techniques [merely invoked here], we can devise an inter-individual average genetic distance matrix of size 40 x 40 (20 females and 20 identifiable male pools). We can, among other questions, ask:
(2) Are male pools overdispersed, relative to the female pools? Are they overdispersed, relative to what would be expected from a homogeneous population draw?
(3) Do female gamete pools show any pattern with physical separation? One suspects that there will be no real pattern, over the distances in question, but if we have some tight clusters, there may be some autocorrelation at short distances.
(4) Do male gamete pools show any pattern with that same physical separation, and how does that pattern relate to the female pool? Can we relate it to the area from which pollen is provided for different females?
(5) Is there any way to determine how many males are involved or the extent to which that number varies with different females? In other words, how can we relate overdispersion to the 'number of males' problem?
Allelic Richness Measures
Most of the indirect (structure-based) work has been based on analyses of GST, FST, NST or (more recently) RST, each a special case of the FST measure of Excoffier et al. (1992), following the basic theme of Wright. Because all such measures depend primarily on the most frequent alleles (haplotypes), they are not very sensitive to the sorts of population processes of interest in regional gene flow. We may need some other measure that is more sensitive to the sorts of processes under study. Previous work has shown that allelic richness is far more sensitive gauge of agglomeration of disparate gene pools than is heterozygosity (or related, structure-like statistics). Slatkin's rare allele methods, and other sorts of measures, hold promise. Chakraborty et al. (1989) and Neel et al. (1989) have shown that allelic richness, particularly the number of rare alleles is highly sensitive to pooling of heterogeneous gene pools. Excoffier et al. (1992) have demonstrated the same phenomenon for mtDNA haplotypes (though they interpreted their results differently). It might even be possible to improve on such techniques, if we were to incorporate the level of phylogenetic divergence, though I have my doubts. Additional theoretical work is needed here.
For analytical reasons, we may yet discover that allelic richness is a more informative measure of the mixture phenomenon than structure statistics. There are serious sample-size effects, and we will need to be concerned with rarifaction analysis. The spectrum of allele numbers, particularly that of rare allele numbers, is drastically affected by the process of mixing divergent gene pools. Quite apart from interest in the allelic spectrum by conservation biologists, it may provide powerful statistical clues about longer-range genetic movements.
Adding Males
If the pollen profile for each female were drawn from a homogeneous pollen cloud, with average allele frequencies, we ought to be able to detect departures from homogeneity, but whether we could do much more than that remains to be seen. It seemed to several of us that if we had at least a sample of the local males, around each of the females, we could 'anchor' the local pollen cloud frequencies. Since that 'local pool' will be different for the different females, we might hope to be able to partition out the local effects from the total heterogeneity of pollen profiles for the different females, and should be able to ask whether (and to what extent) differences in local male composition would account for non-homogeneity of the total pollen pool. So, we add males around each cluster of sampled females, say 30 (just as a rough rule of thumb), determine their pollen capabilities, and show how uneven they are in their contributions. It might develop that the long-distance (from outside the local area) pollen flow is homogeneous across females, once we partition out the local male effects. Our results with Cecropia (Kaufman et al. 1998; see also Smouse et al. abstract, Part III of these proceedings) would suggest that if the outside contributors are far enough away, their relative distances to local females are so similar that differential contributions are hard to detect.