In general terms, these imputation algorithms use genetic variants shared between to-be-imputed genomes and the reference panel and apply Hidden Markov Models (HMM) to impute the missing genotypes per sample ( Das et al., 2018). Standard approaches to genotype imputation utilize Hidden Markov Models (HMM) ( Browning et al., 2018 Das et al., 2016 Rubinacci et al., 2020) distributed alongside large WGS-based reference panels ( Browning and Browning, 2016). Given the massive scale of genotyping required for genome-wide association studies or implementation of genetically informed population health initiatives, genotype imputation is an essential approach in population genetics. This process, called genotype imputation, allows for the generation of nearly the full complement of known common genetic variation at a fraction of the cost of direct genotyping or sequencing. Thus, genotypes for a small subset (1–10%) of observed common genetic variants can be used to infer the genotype status of unobserved but known genetic variation sites across the genome (on the order of ~1 M of >10 M sites Li et al., 2009 Marchini and Howie, 2010). The non-uniform nature of recombination breakpoints producing these genomic blocks results in correlative genotype relationships across genetic variants, known as linkage disequilibrium. The human genome is inherited in large blocks from parental genomes, generated through a DNA-sequence-dependent shuffling process called recombination. Our autoencoder-based genotype imputation strategy achieved superior imputation accuracy across the allele-frequency spectrum and across genomes of diverse ancestry, while delivering at least fourfold faster inference run time relative to standard imputation tools. Here, we demonstrate an autoencoder-based approach for genotype imputation, using a large, commonly used reference panel, and spanning the entirety of human chromosome 22. Artificial neural network-based imputation approaches may overcome these limitations by encoding complex genotype relationships in easily portable inference models. Moreover, the accuracy of current statistical approaches is known to degrade in regions of low and complex linkage disequilibrium. This results in computational resource and privacy-risk barriers to access to cutting-edge imputation techniques. Standard statistical imputation approaches rely on the co-location of large whole-genome sequencing-based reference panels, powerful computing environments, and potentially sensitive genetic study data. Genotype imputation is a foundational tool for population genetics.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |