Improving Genotyping Efficiency in Pharmacogenomics: Importance of Phasing

Monika Opalek, PhD

Pharmacogenomics (PGx) is a powerful field in personalized medicine that focuses on how an individual's unique genetic makeup affects their response to medications. By studying genetic variants that affect drug metabolism, efficacy, and toxicity, PGx can guide individualized treatments and ensure optimal drug selection and dosing. The overall aim is to maximize therapeutic efficacy while minimizing adverse effects.

For PGx testing, determining whole haplotypes rather than identifying individual SNPs is essential. Specific haplotypes are represented by star-alleles, denoted with an asterisk (*) followed by a number, with each star-allele indicating a unique combination of genetic variants within a pharmacogene. 

Accurately determining a star-allele is a challenging process. This difficulty arises because many pharmacogenes exhibit high levels of polymorphism, including single nucleotide polymorphisms (SNVs), small insertions and deletions (INDELs), and larger structural variants (SVs) such as duplications (DUPs), deletions (DELs), tandem rearrangements, and hybridizations (e.g., CYP2D6 and the CYP2D7 pseudogene). Traditional short-read sequencing technologies often struggle to accurately map these regions due to sequence similarity, resulting in ambiguities and poor mapping quality. This can result in incorrect genotyping, namely incorrect assignment of the star allele. 

Phasing

Phasing is a crucial  step in haplotype determination and overall PGx analysis. The purpose of phasing is to determine whether certain variants are located on the same chromosome or on opposite chromosomes, i.e., whether these variants were inherited together from one parent or not. We have optimized the phasing process in the PGx workflow for long-read sequencing technology using the PEPPER-Margin-DeepVariant pipeline (Shafin et al 2021). 

Below is an example of how phasing can change the outcome of the analysis. The same sample was sequenced using short-read (Illumina) and long-read (Oxford Nanopore) technologies. Analysis of the unphased short-read sequences for the CYP4F2 gene suggests the presence of two non-referential alleles (top panel of the IGV graph) and that the diplotype corresponds to CYP4F2*2/*3. However, analysis of the phased long reads shows that both variants are actually present on the same chromosome (reads marked in blue on the IGV graph), implying the presence of one referential allele (*1) and one *4 allele. This results in a diplotype of CYP4F2*1/*4.  

Screenshot of the IGV with two marked variants (chr19:15879621 and chr19:15897578) in the CYP4F2 gene. Top panel shows unphased reads from Illumina short-read sequencing, bottom panel shows phased (color-marked phases) reads from Oxford Nanopore long-read sequencing

PharmVar table comparing variants present in discussed haplotypes of the CYP4F2 gene

PGx genotyping at Intelliseq

In our PGx workflows, star allele diplotypes are identified using our proprietary Intelliseq Polygenic tool. Intelliseq Polygenic is a toolkit for a wide range of polygenic score analysis tasks and is a core component of the PGx workflow. It identifies the most likely diplotypes based on individual star-allele prediction models for pharmacogenes. The star allele definitions and models for pharmacogenes are based on the nomenclature provided by the PharmVar and PharmGKB databases. Polygenic uses phased and unphased information about SNVs and SVs from the gVCF file in combination with YAML-based models for pharmacogenes and outputs detailed information about possible and most probable diplotypes.

Print screen from the Polygenic tool README file showing the example of the star-allele prediction model for the CYP2D6 pharmacogene.

Summary

Pharmacogenomics (PGx) aims to personalize healthcare by interpreting genetic variants that influence drug response. This process focuses on identifying star alleles, which represent specific combinations of genetic variants within a haplotype. Due to the high level of polymorphism within pharmacogenes, the genotyping process is particularly challenging. This challenge can be addressed with long-read sequencing technology, which improves the phasing process. By optimizing the phasing process and using advanced tools and models such as Intelliseq's Polygenic, PGx analysis becomes more accurate. 

Learn more about our solutions within the PGx workflow by requesting a demo with one of our experts! 

References:

Shafin, K., Pesout, T., Chang, P.-C., Nattestad, M., Kolesnikov, A., Goel, S., Baid, G., Kolmogorov, M., Eizenga, J. M., Miga, K. H., Carnevali, P., Jain, M., Carroll, A., & Paten, B. (2021). Haplotype-aware variant calling with PEPPER-Margin-DeepVariant enables high accuracy in nanopore long-reads. Nature Methods, 18(11), 1322–1332. https://doi.org/10.1038/s41592-021-01299-w 

<h2>Want to know more?</h2>

Want to know more?

Get in touch with us.