RandomWalk: Biology 101

On the post topic: Well, not really.

As related earlier, the genetic testing companies do give you access to your raw genotype data, and I have a penchant for taking things apart to find out what they're made of... This, in turn, can only lead to one thing: Genetic engineering! Okay, kidding on that as well, but there are some analysis I am looking to share.

I'm going to make some posts on genotypes and their analysis, though, and for any of that to be understandable I guess there's a handful of biological basis that need to be covered. To most people, I imagine, these should be fairly familiar from basic biology lessons, but after a decade or few, it'd be no surprise to have forgotten some, and rehearsal never hurts.
First I have to cover what is meant by "raw genotype data". They're called SNP's, or Single Nucleotide Polymorphisms - this is where a single base in the genetic code has changed into another. 23andMe will also test for limited number of deletions, insertions and substitutions, which are pretty much what the name implies, but the majority of their raw results are classified as SNP's. In general, SNP's are considered UEP's, "Unique-Event Polymorphisms". What does this mean?

Human genome has 23 chromosome pairs with 3 billion base-pairs among them. But about 99.9% of them are the same among all living people (and still 99% with chimpanzees). This is according to Wikipedia; dbSNP seems to currently list at least 60,558,600 known SNP's for Homo Sapiens giving 2% out of 3 billion so there seems to be some differing counts. Regardless the amount is significantly less than the whole genome, so why not just read the locations likely to be different? This is what "genotyping" does, in the direct-to-consumer tests testing for around million representative SNP's.

I like Wikipedia's definition of them as "represent the inheritance of events it is believed can be assumed to have happened only once in all human history", or more specifically "In genetic genealogy a unique-event polymorphism (UEP) is a genetic marker that corresponds to a mutation that is likely to occur so infrequently that it is believed overwhelmingly probable that all the individuals who share the marker, worldwide, will have inherited it from the same common ancestor, and the same single mutation event."

Unfortunately for genealogists everywhere, I suppose, Homo Sapiens has been around for hundreds of thousands years, and most SNP's probably predate even that. Consequently those individual SNP's have ended up all around the place, and it's not generally possible to point to any single SNP and say "Only our clan has that". Instead, people have a random mix of them, and only their proportions vary in different populations. And when people from those populations have children together, some of that structure remains, in what is called admixture.

To understand how those SNP frequencies can survive, I guess a good reference would be The Process of Meiosis, from whence I shall shamefully link this illustration. The blue and red chromosomes represent matching chromosome pair (one from each of their parents), already split apart and duplicated, within individual. At the end of the meiosis shown here the pairs of the chromosomes will still split from the middle, forming 4 gametes one of which will pass to the child. This process happens independently for each chromosome pair within the genome.

This way the child doesn't receive a truly random permutation of the parent genotypes, but the genotypes they receive remain the same between the recombination events, or "crossovers" along the genome. This is to say that "genetically close" genotypes, and this genes, are likely to be passed together to the offspring. These recombination events happen rarely enough that it is possible to track both relatives (more or less identical stretches of genotypes) and ancestral populations (ranges where the genotype distribution is typical for specific populations).

I guess that is enough for one post.

RandomWalk

Sunday, August 18, 2013

Biology 101

No comments:

Post a Comment