SynVar

Variant synonyms generation and normalization

Background

Genetic variants are drawing increasing interest regarding their role in pathologies, for designing new drugs or refining treatment efficacy through stratification. However, variant interpretation depends on time-consuming curation tasks. To support variant interpretation efforts and decisions based on the latest evidences, we propose Variomes [1], a service performing variant-specific triage of publications.

To increase the comprehensiveness of Variomes, we developed SynVar. This tool enables the generation of synonyms and normalization of variants. This task faces different challenges:

While many databases of polymorphisms and somatic variants exist, such as ClinVar, ClinGen or dbSNP, using them as terminologies has several drawbacks:

Description

To enable a smooth and effective retrieval of variants in the literature, we developed a synonym generation tool that enables to generate for a given variant – including variants not described in existing databases – its corresponding description at the genome, cDNA/transcript and protein level, in the HGVS format as well as in many non standard – yet frequently used – descriptions found in the literature. It is adapted for variant expansion and normalization from any description level.

Supported variant types

SynVar supports the following variant types according to HGVS nomenclature:

Isoform support

SynVar can recognize and process variants specified on protein isoforms. When the optional parameter iso=true is provided, the tool expands the variant to all available isoforms of the gene. The system accepts:

Example: TP53 R248W with iso=true returns synonyms for all 9 TP53 isoforms. The variant is first validated on the canonical isoform (P04637-1). If not valid there, the system automatically searches other isoforms. With iso=true, all 9 isoforms are returned regardless of which isoform was initially validated.

Workflow

Use-cases

Protein variant: the change is validated on the reference sequence of the canonical isoform, by default, as retrieved by the UniProt API tool [2]. The valid variant is then backtranslated into the possible cDNA/transcript variants, using the back-translator tool from Mutalyzer [3]. Finally the cDNA variant is mapped onto its genomic position (GRCh37 and GRCh38 builds) using VariantValidator [4].

cDNA/transcript variant: the variant is validated and mapped onto genome position using VariantValidator [4], which also translates it into the corresponding protein variant.

Genomic variant: the variant is validated and converted to the cDNA/transcript variants using VariantValidator [4], if not intergenic. VariantValidator also provides the translation into protein variants. If intergenic, only genomic variant synonyms are generated.

dbSNP id: The different genomic variants associated to the dbSNP [5] id are retrieved through the NCBI eutils services. The conversion and translation procedure from genomic variant is similar to the one described above.

ClinGen Allele Registry ID: The genomic variant corresponding to the ClinGen Allele Registry ID (CA ID) is retrieved through the ClinGen Allele Registry [6]. The genomic mapping and translation is similar to the one described above.

Output

Results are returned as a list of genomic variants (unique position and change), along with their corresponding transcript and protein variants, grouped by genes and isoforms. The output is in XML format. The main elements are the following:

Programmatic access

URL

https://synvar.sibils.org/generate/literature/fromMutation

Parameters

Optional parameters

Examples

Substitutions (SNPs)

Deletions

Duplications and Insertions

Isoform-specific queries

Database identifiers

Special cases with map parameter

Automatic detection (without ref or level parameters)

Variant extraction from complex text

Normalization only (norm parameter)

Output formats

Search interface

Fields

Template program

To query the service and parse the output: queryVariant.py

References

  1. Mottaz A, Pasche E, Michel PA, Mottin L, Teodoro D, Ruch P. Designing an Optimal Expansion Method to Improve the Recall of a Genomic Variant Curation-Support Service. Stud Health Technol Inform. 2022 May 25;294:839-843. doi: 10.3233/SHTI220603. PubMed
  2. Pasche E, Mottaz A, Caucheteur D, Gobeill J, Michel PA, Ruch P. Variomes: a high recall search engine to support the curation of genomic variants. Bioinformatics. 2022 Apr 28;38(9):2595-2601. doi: 10.1093/bioinformatics/btac146. PubMed>
  3. The UniProt Consortium (2023). UniProt: the Universal Protein Knowledgebase in 2023. Nucleic Acids Research, 51(D1), D523–D531. https://doi.org/10.1093/nar/gkac1052
  4. den Dunnen J. T. (2016). Sequence Variant Descriptions: HGVS Nomenclature and Mutalyzer. Current protocols in human genetics, 90, 7.13.1–7.13.19. https://doi.org/10.1002/cphg.2
  5. Freeman, P. J., Hart, R. K., Gretton, L. J., Brookes, A. J., & Dalgleish, R. (2018). VariantValidator: Accurate validation, mapping, and formatting of sequence variation descriptions. Human mutation, 39(1), 61–68. https://doi.org/10.1002/humu.23348
  6. Smigielski, E. M., Sirotkin, K., Ward, M., & Sherry, S. T. (2000). dbSNP: a database of single nucleotide polymorphisms. Nucleic acids research, 28(1), 352–355. https://doi.org/10.1093/nar/28.1.352
  7. Pawliczek, P., Patel, R. Y., Ashmore, L. R., Jackson, A. R., Bizon, C., Nelson, T., Powell, B., Freimuth, R. R., Strande, N., Shah, N., Riegel, B., Meeks, M., Levy, M. A., Kattman, B., Berg, J. S., & Harrison, S. M. (2018). ClinGen Allele Registry links information about genetic variants. Human mutation, 39(11), 1690–1701. https://doi.org/10.1002/humu.23637