Analysis of molecular variance (AMOVA) Introduction We’ve already encountered π, the nucleotide diversity in a population, namely π = ∑ ij xixjδij , where xi is the frequency of the ith haplotype and δij is the fraction of nucleotides at which haplotypes i and j differ.1 It shouldn’t come to any surprise to you that just as there is interest in partitioning diversity within and among populations when we’re dealing with simple allelic variation, i.e., Wright’s F -statistics, there is interest in partitioning diversity within and among populations when we’re dealing with nucleotide sequence or other molec- ular data. We’ll see later that AMOVA can be used very generally to partition variation when there is a distance we can use to describe how different alleles are from one another, but for now, let’s stick with nucleotide sequence data for the moment and think of δij simply as the fraction of nucleotide sites at which two sequences differ. Analysis of molecular variation (AMOVA) The notation now becomes just a little bit more complicated. We will now use xik to refer to the frequency of the ith haplotype in the kth population. Then xi· = 1 K K∑ k=1 xik is the mean frequency of haplotype i across all populations, where K is the number of populations. We can now define πt = ∑ ij xi·xj·δij 1When I introduced nucleotide diversity before, I defined δij as the number of nucleotides that differ between haplotypes i and j. It’s a little easier for what follows if we think of it as the fraction of nucleotides at which they differ instead. c? 2001-2010 Kent E. Holsinger πs = 1 K K∑ k=1 ∑ ij xikxjkδij , where πt is the nucleotide sequence diversity across the entire set of populations and πs is the average nucleotide sequence diversity within populations. Then we can define Φst = πt ? πs πt , (1) which is the direct analog of Wright’s Fst for nucleotide sequence diversity. Why? Well, that requires you to remember stuff we covered eight or ten weeks ago. To be a bit more specifi


