Step 1: Variant evaluation in terms of fulfilling ACMG/AMP minor categories criteria
The American College of Medical Genetics and Genomics and the Association for Molecular Pathology (ACMG/AMP) minor categories criteria concern different gene and variant features supporting its pathogenic or benign character [1, 2]. The following characteristics are analyzed:
Implemented ACMG minor categories (names as in ):
Step 2: Variant assignment to major ACMG/AMP categories
For fulfilling any of the aforementioned ACMG/AMP minor categories criteria, a given variant receives a score ranging from 0 to 1. The score reflects the certainty level of the variant assignment, and higher scores denote higher confidence. The final ACMG score is a weighted sum of the gained subcategory points. We use negative weights for the benign and positive for the pathogenic subcategories. The absolute values of the applied weights correspond to the given subcategory importance. Finally, we classify the variant into one of the major categories: pathogenic, likely pathogenic, benign, likely benign, or of uncertain significance, according to the final score (see Figure 1). Note, however, the score itself constitutes a continuous pathogenicity predictor. The score may be particularly convenient for the uncertain variants’ reevaluation. See  and the METHODS section below for detailed information on the scores combining algorithms.
Figure 1. ACMG score and the corresponding categories
 S. Richards et al., Standards and Guidelines for the Interpretation of Sequence Variants: A joint Consensus Recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology Genet. Med. Off. J. Am. Coll. Genet., vol 17. no. S, pp. 405-424, May 2015
 A. N.Abou Tayoun e. al., Recommendations for interpreting the loss of function PVS1 ACMG/AMP variant criterion, Hum. Mutat., vol 39, no 11, pp. 1517-1524, 2018, Online: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4544753/
 S. V. Tavtigian et al., Modeling the ACMG/AMP Variant Classification Guidelines as a Bayesian Classification Framework, Genet. Med. Off. J. Am. Coll. Med. Genet., vol. 20. no. 9. pp. 1054-1060, Sep. 2018.
We downloaded the up-to-date ClinVar .vcf file and annotated the variants with our pipeline, adding information necessary for the ACMG/AMP criteria evaluation (frequencies, conservation scores, functional annotations, etc.). We also performed the initial variant filtering, removing those with high frequencies (5% and more) and variants not affecting genes (SnpEff MODIFIERS). Then, we assessed variant pathogenicity with our software implementing the ACMG/AMP standards. Finally, we analyzed the concordance between the ClinVar significance (annotation already present in the downloaded file) and our pathogenicity predictions. Figures 2A and 2B summarize the results, showing high compatibility between ClinVar and ACMG variants assignment. Our method performs especially well for the pathogenic and likely pathogenic variants.
Figure 2. A) Assignment of 520394 variants of known clinical significance (CLINVAR SIGNIFICANCE) to the ACMG categories. B) Comparison of the ACMG scores obtained for variants from different ClinVar based significance groups.
‘PVS1 null variant (nonsense, frameshift, canonical ±1 or 2 splice sites, initiation codon, single or multiexon deletion) in a gene where the loss of protein function (LoF) is a known mechanism of disease.’
1. Examination, whether a given variant may lead to loss of protein function. This step is based on the SnpEff ANN field:
Such transcripts are referred to, in the below text, as “reliable” ones.
If the variant does not fulfill the above condition, we omit other tests, and the variant’s final PVS1 score is 0. Otherwise, we apply additional tests described below.
2. Examination of the LoF annotation confidence based on the VEP CSQ and loftee algorithm (Karczewski et al., 2020). Assignment of the variant_score.
Only transcripts with the same id as the “reliable” ones are analyzed.
3. Examination of the LoF intolerance of the mutated genes (based on ClinVar and gnomAD data). Assignment of the gene_score.
4. Calculation of the final PVS1 scores as:
‘PS1 Same amino acid change as a previously established pathogenic variant regardless of nucleotide change.’
1. Evaluation of whether the analyzed variant leads to a protein change (based on the SnpEff ANN field, performed for the protein_coding transcripts without errors/warnings messages assigned). If not, we stop analysis, and the variant’s final PS1 score is 0.
2. Comparison of the protein changes caused by the variant with the ClinVar Pathogenic, Pathogenic/Likely Pathogenic, and Likely Pathogenic records (annotated with SnpEff).
To identify the identical changes, we compare gene names, transcript ids, and protein sequence changes (this task is performed differently for different mutation types, see below). We do not use the chromosomal coordinates nor alleles only, as nucleotide changes at different positions may lead to the same alternation at the protein sequence level.
Mutation types and conditions for the same protein sequence change criterion:
3. Assignment of the final PS1 score:
‘PS3 Well-established in vitro or in vivo functional studies supportive of a damaging effect
on the gene or gene product.’
1. Evaluation of whether the analyzed variant leads to a missense change (based on the SnpEff ANN field, performed for the protein_coding transcripts without errors/warnings messages assigned). If not, we stop analysis, and the variant’s final PS3 score is 0.
2. Evaluation of whether the UniProt database describes the identical or similar change to the one caused by the analyzed variant, as leading to the reduction or hyperactivity of protein (UniProt categories: MUTAGENESIS). To be analyzed, UniProt mutations must affect only single amino acid (start position = end position; and description without the ‘when associated with’ phrase).
3. Assignment of the final PS3 score:
Identical mutation: the same gene, position, reference amino acid, and altered amino acid.
Similar mutation: the same gene, position, and reference amino acid, but a different altered amino acid.
The UniProt database provides the mutagenesis experiment outcomes only as text descriptions. We classified them as containing information about a decrease or increase in protein performance using the custom natural language processing script.
In this category, we analyze the known effects of the variant on protein performance only.
We analyze the known associations with the phenotype or disease in the PS1 and PP4 ACMG categories. Thus, we add independent information to the final classification.
‘PM1 Located in a mutational hot spot and/or critical and well-established functional domain (e.g., the active site of an enzyme) without benign variation.’
Critical and well-establish functional domain without benign variation part
1. We evaluate if the analyzed variant leads to the amino acid substitution, frameshift, stop gain, inframe deletion, or inframe insertion (based on the SnpEff ANN annotation, we consider only transcripts of the protein_coding biotype and without errors/warnings messages added).
If this condition is not met, we do not perform other tests described for this part (functional_part_score = 0.0). Otherwise, for the protein-changing variants, we perform additional tests (points 2 and 3).
2. We analyze whether the variant lies within the UniProt functional domain without known benign variation. We define the well-established functional domain as a region annotated in the UniProt database as SIGNAL, CA_BIND, ZN_FING, DNA_BIND, ACT_SITE, BINDING. This region cannot contain any known Benign, Benign/Likely benign, and Likely benign protein-changing
variants (according to the ClinVar database).
Notes: Our algorithm may ignore benign splice acceptor/donor variants during this test.
3. Additionally, we check whether SnpEff annotated the variant as protein_protein_contact or structural_interaction_variant.
Notes: These annotations are not present in the manually built SnpEff databases.
The score from this part (functional_part_score) is:
Mutational hot spot part
We examine if the variant occurs within 31bp (+/-15bp) DNA region with more than 2 Pathogenic, Pathogenic/Likely pathogenic, or Likely pathogenic variants (according to the ClinVar database).
Notes: In this part, variants that do not change proteins may obtain a non-zero score, which accounts for the possibility of a pathogenicity mechanism unrelated to protein dysfunction.
The score from this part (hot_spot_score) is:
The final PM1 score is the larger of both scores (functional_part_score, hot_spot_score).
‘PM2 Absent from controls (or at extremely low frequency if recessive) in Exome Sequencing Project, 1000 Genomes Project, or Exome Aggregation Consortium.’
NUCLEAR VARIANTS: We examine the overall frequency in gnomAD exomes or gnomAD genomes v3 database (exomes first).
○ If the fraction of samples with coverage above 0 exceeds 0.8: the final PM2 score is 1.0;
○ If the above fraction is lower than 0.8: the final PM2 score is 0.25;
○ If the coverage data is not present in the gnomAD database: the final PM2 score is 0.25.
MITOCHONDRIAL VARIANTS: We examine the variant frequency in the MITOMAP database.
‘PM4 Protein length changes as a result of in-frame deletions/insertions in a non-repeat region or stop-loss variants.’
We examine if a given variant leads to one of the following changes (based on SnpEff ANN field, we analyze only transcripts of the protein_coding biotype, and without added errors/warnings):
If so, we give 1.0 as the final PM4 score; otherwise, the final PM4 score is zero.
Nonrepeat region definition: not specified as simpleRepeat region (UCSC).
‘PM5 Novel missense change at an amino acid residue where a different missense change determined to be pathogenic has been seen before
Example: Arg156His is pathogenic; now you observe Arg156Cys
Caveat: Beware of changes that impact splicing rather than at the amino acid/protein level.’
1. We start by checking if the given ISEQ variant leads to protein change (based on the SnpEff ANN field, we analyze only the subset of transcripts with the protein_coding biotype and without any errors/warnings flags). If the variant does not change any protein, its final PM5 score is 0.0, and we omit all the below tests.
2. Then, we analyze whether identical changes are not already present in ClinVar. To this end, we use the specific phrase inserted by the PS1 script (Variants^introducing^the^same^change^are^classified^in^the^ClinVar^database in the ISEQ_ACMG_PS1_DESCRIPTION field). If the phase is present, we assume that this variant is not novel and its associations with diseases are known. Thus, in such a case, the final PM5 score is 0.0.
3. For variants not described in the ClinVar database, we examine whether the changes they lead to are similar to those caused by the known ClinVar variants (see below how the similarity is defined). To this end, we compare gene names, transcript ids, and protein level mutation (this step is performed differently for different mutation types, see below).
Definitions for the similar protein changes for distinct mutation types:
4. Finally, we assign the final PM5 score, which is:
‘PP2 Missense variant in a gene that has a low rate of benign missense variation and in which missense variants are a common mechanism of disease.’
We examine if variant causes amino acid substitution (based on the SnpEff ANN field, we analyze transcripts with the protein_coding biotype and without errors/warnings present)
○ if not: the final P2 score is 0.0;
○ if so: we check whether the number of known pathogenic missense variants is greater than twice the number of the benign missense variants;
■ if so: we give 1.0 as the final PP2 score;
■ if not: the final PP2 score is 0.0.
Pathogenic missense variants: Pathogenic, Likely pathogenic, or Pathogenic/Likely pathogenic in the ClinVar aggregated significance record.
Benign missense variants: described in ClinVar as Benign, Benign/Likely benign, Likely benign, Uncertain significance, or having conflicting interpretations of pathogenicity.
‘PP3 Multiple lines of computational evidence support a deleterious effect on the gene or gene product (conservation, evolutionary, splicing impact, etc.)’
We check the SIFT score (ISEQ_SIFT4G_MIN) for each variant (in fact for each vcf line). The final PP3 score is:
The 0.05 threshold was chosen according to PMID: 11337480.
‘PP4 Patient’s phenotype or family history is highly specific for a disease with a single genetic etiology.’
We check the ISEQ_GENES_PANEL_SCORE. This score reflects the gene-to-phenotype match and can be from 0 to 100%. The final PP4 score is 0.01 times the maximal panel score (of values obtained by all genes affected by the variant).
‘BA1 Allele frequency is >5% in Exome Sequencing Project, 1000 Genomes Project, or Exome Aggregation Consortium.’
1. For nuclear variants, we examine variant overall frequency in the gnomAD exomes, or if this is
not given, in the gnomAD genomes v3 database;
2. For mitochondrial variants, we check the MITOMAP database;
‘BS2 Observed in a healthy adult individual for a recessive (homozygous), dominant (heterozygous), or X-linked (hemizygous) disorder, with full penetrance expected at an early age.’
For autosomal/X-linked variants
We check the number of homozygous (alt/alt) individuals in the gnomAD exomes (ISEQ_GNOMAD_EXOMES_nhomalt field):
For mitochondrial/Y-chromosome variants
It checks alt allele count (MITOMAP_AC for mitochondrial, ISEQ_GNOMAD_EXOMES_AC or, if exomes data not present, ISEQ_GNOMAD_GENOMES_V3_AC for Y chromosome):
*MITOMAP: data on about 50 000 mitochondrial genotypes; 1/2 of the gnomAD database (men
‘BP1 Missense variant in a gene for which primarily truncating variants are known to cause disease.’
We examine if the given variant is a missense variant (based on the SnpEff ANN field, we consider only transcripts without warnings/errors and of the protein_coding biotype);
LoF test: checks if the given gene has known null pathogenic variants in the ClinVar database. The score from this test (lof_test_score) is:
Missense test: checks if the given gene has any pathogenic missense variants. We perform this
test only for genes with a non-zero lof test score. The score from this test (missense_score) is:
Pathogenic missense/LoF variants: Pathogenic, Pathogenic/Likely pathogenic, or Likely pathogenic in the ClinVar database.
The final BP1 score is:
‘BP3 In-frame deletions/insertions in a repetitive region without a known function.’
We examine if a given variant leads to one of the following changes (based on the SnpEff ANN field, analyses are restricted to transcripts of the protein_coding biotype and without errors/warnings flags):
We also ensure that more severe consequences are not predicted for any of the variant transcripts (missense, stop loss or gain, splice donor/acceptor site change, frameshift, start lost). Next, for the appropriate variants, we check if they are in a repeat region (specified as simple repeat region UCSC).
The final BP3 score is:
‘BP4 Multiple lines of computational evidence suggest no impact on gene or gene product (conservation, evolutionary, splicing impact, etc.)’
For each variant (in fact, for each ISEQ vcf line), we check the SIFT score (ISEQ_SIFT4G_MIN). The final BP4 score is:
We chose the 0.05 threshold according to PMID: 11337480.
‘BP5 Variant found in a case with an alternate molecular basis for disease.’
We check variant clinical significance in the ClinVar database.
The significance is taken from the ISEQ_AGGREGATED_CLINVAR_SIGNIFICANCE field.
The final BP5 score is:
‘BP7 A synonymous (silent) variant for which splicing prediction algorithms predict no impact to the splice consensus sequence nor the creation of a new splice site AND the nucleotide is not highly conserved.’
1. We check whether the given variant leads to synonymous changes (based on SnpEff ANN, we restrict all analyses to transcripts without warning/error flag, and of the protein_coding biotype).
2. Next, we ensure whether the variant does not lead to any other more severe protein level change (also in the selected transcripts).
The final BP7 score is:
We do not check conservation to avoid repeated evaluation of evidence already examined in the PP3 and BP4 categories.
We calculate the final ACMG score as:
ACMG SCORE =
1 x PVS1 SCORE
+ 1/2 x (PS1 SCORE + PS3 SCORE)
+ 1/4 x (PM1 SCORE + PM2 SCORE + PM4 SCORE + PM5 SCORE)
+ 1/8 x (PP2 SCORE + PP3 SCORE + PP4 SCORE)
– 1 x BA1 SCORE
– 1/2 x BS2 SCORE
– 1/4 x (BP5 SCORE + BP7 SCORE)
– 1/8 x (BP1 SCORE + BP3 SCORE + BP4 SCORE)
The score is then used to classify variants as: