Impact of genetic similarity on imputation accuracy.

Abstract:

BACKGROUND: Genotype imputation is a common technique in genetic research. Genetic similarity between target population and reference dataset is crucial for high-quality results. Although several reference panels are available, it is often not clear which is the most optimal for a particular target dataset to be imputed. Maximizing genetic similarity between study sample and intended reference panels may be the straight forward method for selecting the genetically best-matched reference. However, the impact of genetic similarity on imputation accuracy has not yet been studied in detail. RESULTS: We performed a simulation study in 20 ethnic groups obtained from POPRES. High-quality SNPs were masked and re-imputed with MaCH, MaCH-minimac and IMPUTE2 using four different HapMap reference panels (CEU, CHB-JPT, MEX and YRI). Imputation accuracy was assessed by different statistics. Genetic similarity between ethnic groups and reference populations were measured by F -statistics (F(ST)) originally proposed by Wright and G -statistics (G(ST)) introduced by Nei and others. To assess the predictive power of these measures regarding imputation accuracy, we analysed relations between them and corresponding imputation accuracy scores. We found that population genetic distances between homogeneous reference and target populations were strongly linearly correlated with resulting imputation accuracies irrespective of considered distance measure, imputation accuracy measure, missingness and imputation software used. Possible exception was African population. CONCLUSION: Usage of G(ST) or F(ST)-related measures for predicting the optimal reference panel for imputation frameworks relying on a specific reference is highly recommended. A cut-off of G(ST) < 0.01 is recommended to achieve good imputation results for high-frequency variants and small data sets. The linear relationship is less pronounced for low-frequency variants for which we also observed a dependence of imputation accuracy on the number of polymorphic sites in the reference. We also show that the software specific measures MaCH-Rsq and IMPUTE-info must be interpreted with caution if the genetic distance of target and reference population is high.

PubMed ID: 26193934

Projects: Genetical Statistics and Systems Biology, LIFE - Leipzig Research Center for Civilization Diseases

Publication type: Not specified

Journal: BMC Genet

Human Diseases: No Human Disease specified

Citation: BMC Genet. 2015 Jul 22;16:90. doi: 10.1186/s12863-015-0248-2.

Date Published: 22nd Jul 2015

Registered Mode: by PubMed ID

Authors: N. R. Roshyara, M. Scholz

Help
help Submitter
Activity

Views: 2023

Created: 9th May 2019 at 10:56

Last updated: 7th Dec 2021 at 17:58

help Tags

This item has not yet been tagged.

help Attributions

None

Related items

Powered by
(v.1.13.0-master)
Copyright © 2008 - 2021 The University of Manchester and HITS gGmbH
Institute for Medical Informatics, Statistics and Epidemiology, University of Leipzig

By continuing to use this site you agree to the use of cookies