RandomWalk: Admixtures - what they are, and aren't

Well, I promised some results for this week, so I guess I better get writing. Piecing together genetic genealogy is something that sometimes recalls Firesign Theatre's old (and by old, I mean 60's) sketch "Where're you from?" "Nairobi, ma'am. Isn't everybody?" Regardless, "genetic genealogy blogs" have been proliferating, and as hinted earlier there are some tools to help make things lot easier.

In this post I'll turn out attention to ADMIXTURE; to most people who've looked at personal genetics this can be familiar from GEDmatch's "Ad-Mix Utilities". Unfortunately there are some misconceptions about them, which I'm in part trying to set straight here, while exploring my own ancestry. The first tidbit should be self-evident but doesn't always seem to: Since all of the admixture calculators on GEDmatch give different results, they can't possibly all be correct.

So what, then, does ADMIXTURE do? To give a fairly technical summary, it tries to determine the frequency of each Single Nucleotide Polymorphism in K different populations, and the contributions of those K populations in the genetic makeup of each individual in the analysis. Ie. if SNP 1 has frequency of 25% in population A and 75% in population B, then an individual with a copy of SNP 1 from both parents has 6.25% chance of being from population A and 92.75% chance of being from population B.

But if they had the SNP - single base change - from just one parent then odds would be 50/50. Still, this wouldn't necessarily mean that they had one parent from each population. Now, if you apply this analysis with 100.000 different SNP's and determine their contribution into 20 different populations for 3000 individuals we're much closer to the real situation. The results are still more a probability than a fraction, though, and it all hinges on correct selection of the sample individuals and the number of populations K.

One good article from Eurogenes blog deals mainly with my opening pitch: Because ADMIXTURE compares (or classifies) individuals into different population clusters, it is only going to suggest differences from those clusters. Put another way, if there was a "Finnish" cluster, a Finnish individual might get out a result of "100% Finnish" with no information of their ancestral makeup. It turns out this is usually what people want out of genetic genealogy, ie. to display only recent admixture, but it can still provide for surprises.

With those caveats in mind, I set out to start to run out some experiments myself. To cut to the and provide a gentle introduction both for myself and any readers, I opted to use a dataset prepared by Razid Khan, over at Discovery Magazine's Gene Expression. Incidentally, since there's limited source of public SNP genotypes with accompanying ancestry information, that may also be pretty much only way to start off. I expect to revisit that issue later, however.

Since I'm also interested in my own genetic origins, I merged in my own genotypes. A peculiarity about this is that my known ancestry is about 3/4ths Finnish and 1/4ths Germanic, while Razid's dataset contains basically no Northern europeans to compare with. I was curious to find out if I would form my own population cluster (isn't everybody?) or what other populations I would be mapped to. So with that, onward to Finding K...

RandomWalk

Saturday, August 31, 2013

Admixtures - what they are, and aren't

No comments:

Post a Comment