The analysis of the data took much longer than I anticipated, largely because the power-outage I had didn't turn out to be an isolated incident. My computer rig outgrew my office Uninterruptable Power Supplies long ago, but I never thought that would be an issue because we experience power disruptions maybe once or twice a year at most. For some reason, the power disruptions carried out daily for days, then stretched to weeks... Made me feel like living in some third world country .
The analysis had reached a phase where each new run took almost a day, and with a daily power outage you can do the math. Worse, there was some filesystem corruption from the sudden computer reboots, so I after a few days I just gave up and decided to wait out the blackouts. In truth I finished the current run of analysis a while ago already, but I've been keeping busy with my job - and still am, so I don't have time for much finesse (as usual).
Incidentally I guess that's not a huge problem; this is more a research journal than a true blog, though I'm hoping to offer more digestible content and possibly regular updates as time goes by. But for now this resembles more a log of records of my research, and I don't expect to have many readers, beyond those that may find this by googling some of the specific things mentioned here. I'll say welcome to any potential readers, and feel free to say hello.
Speaking of which, a rather sizeable dump of the results from my latest run are available. The time crunch means I still haven't got a decent way to visualize it; instead I'm just listing the 25 highest affinity samples for each genetic similarity cluster for each tested K value. As before the clusters are formed on their own, unsupervised, and vary -
often back and forth - betweek K values, so this time I've simply
ordered the clusters from left to right according to my own genetic
makeup in the analysis rather than try to line them up in any absolute
sense. I've ran the analysis for K 1 to 40 for now. Incidentally, that is also the first K value at which Orcadians separate into their own cluster in the analysis. That is a good point to pause and reflect on whether this analysis is generating anything sensible. I expect to have more to write on these things in the future.
There's a reason I'm hurrying this update out - 23andMe has finally announced and started to roll out their new Ancestry Composition update. Other genetic ancestry blogs have already broken the news all around, so I probably shouldn't waste much words on it. But the 23andMe Ancestry Composition is being expanded to recognize 31 different populations; additional ones are Japanese, Korean, Yakut, Mongolian, Chinese, Southeast Asian, West African, South African and Central & South African. With these additions there will be less need for anyone to run their own ADMIXTURE analysis except for learning (or research) purposes. But for now I understand no customers have yet received their results on the new populations.
No comments:
Post a Comment