Variant synonyms generation and normalization


Genetic variants are drawing increasing interest regarding their role in pathologies, for designing new drugs or refining treatment efficacy through stratification. However, variant interpretation depends on time-consuming curation tasks. To support variant interpretation efforts and decisions based on the latest evidences, we propose Variomes [1], a service performing variant-specific triage of publications.

To increase the comprehensiveness of Variomes, we developed SynVar. This tool enables the generation of synonyms and normalization of variants. This task faces different challenges:

While many databases of polymorphisms and somatic variants exist, such as ClinVar, COSMIC or dbSNP, using them as terminologies has several drawbacks:


To enable a smooth and effective retrieval of variants in the literature, we developed a synonym generation tool that enables to generate for a given SNP – including variants not described in existing databases – its corresponding description at the genome, transcript and protein level, in the HGVS format as well as in many non standard – yet frequently used – descriptions found in the literature. It is adapted for variant expansion and normalization from any description level.



Protein variant: the change is validated on the reference sequence of the canonical isoform, by default, as retrieved by the NextProt API tool [2]. The valid variant is then backtranslated into the possible transcript variants, using the back-translator tool from Mutalyzer [3]. Finally the transcript variant is mapped onto its genomic position (GRCh37 and GRCh38 builds) using VariantValidator [4].

Transcript variant: the variant is validated and mapped onto genome position using VariantValidator. It is translated into protein variant using Mutalyzer.

Genomic variant: the variant is validated and converted to the transcript variants using VariantValidator, if not intergenic. Transcript variants are translated into protein variants using Mutalyzer. If intergenic, only genomic variant synonyms are generated.

dbSNP id: The different genomic variants associated to the dbSNP [5] id are retrieved through the NCBI eutils services. The conversion and translation procedure from genomic variant is similar to the one described above.

COSMIC id: The transcript variant corresponding to the COSMIC id is retrieved through the downloadable COSMIC data [6]. The genomic mapping and translation of transcript variant is similar to the one described above.


Results are returned as a list of genomic variants (unique position and change), along with their corresponding transcript and protein variants, grouped by genes and isoforms. The output is in XML format. The main elements are the following:

