Comparison of genome sequences of SARS-CoV-2 and other coronaviruses

In a recent study posted to Research Square*, researchers compared the genomic sequence of severe acute respiratory coronavirus 2 (SARS-CoV-2) and that of different coronaviruses (CoVs).

Study: Different genomic representations of novel pathogens base on signal processing algorithms: COVID-19 case study. Image Credit: Drehstrom/Shutterstock


CoVs are ribonucleic acid (RNA) viruses that cause respiratory and digestive tract infections. CoVs contain spike (S) protein that mediates entry into host cells. CoVs belong to the Coronaviridae family that includes four genera: Alpha, Beta, Gamma, and Delta. Alpha- and Beta-CoVs infect mammals, whereas Gamma- and Delta-CoVs predominantly infect avians.

Phylogenetic studies have reported the complex evolution of CoVs. These viruses use ‘template switching,’ a unique mechanism leading to higher rates of homologous recombination of RNA. Human CoVs are pathogenic, including the SARS-CoV that caused the SARS outbreak, Middle East respiratory syndrome (MERS)-CoV, which resulted in the MERS outbreak, and SARS-CoV-2, the etiologic agent of the current coronavirus disease 2019 (COVID-19) pandemic.

SARS-CoV-2, a Beta-CoV, is enveloped and contains a single-stranded (ss), positive-sense RNA genome. Its genome is approximately 29.9-kilobases. It harbors 11 open-reading frames and 5’ and 3’ untranslated regions (UTRs). A recent study indicated that the SARS-CoV-2 genome is a consequence of the recombination of bat and pangolin CoVs.

The study and findings

The present study employed bioinformatic and signal processing tools to understand the intragenic variations between different CoV genomes and explore SARS-CoV-2’s origin. They compiled a library of 26 genome sequences of CoVs, including SARS-CoV-2, available publicly on GenBank. First, they constructed a chaos game representation (CGR) graph/image, which is an iterative method of mapping wherein each nucleotide is assigned a coordinate (X, Y) in a two-dimensional (2D) space.

The CGR graph was partitioned into equal sub-images, and each sub-region's center-point (centroid) was computed. They determined the distance between the centroids of SARS-CoV-2 genome and the other sequences for each sub-region. Next, they performed electron-ion interaction pseudo-potential (EIIP) mapping to obtain signals from the genomic sequences. They analyzed them with smoothed discrete Fourier transform (SDFT) and continuous wavelet transform (CWT) methods.

Additionally, the authors explored similarities among the sequences with the Clustal X tool, followed by a recombination analysis using the Simplot tool. The 25 genomic sequences were transformed into a numerical representation on a 2D CGR graph/image. They found similarities between the SARS-CoV-2 genome and five other CoV sequences: Bat CoVs RaTG13, COVZC45, COVZXC21, and pangolin CoVs GXP2V and MP789. The nearest genomes to SARS-CoV-2 were RaTG13, GXP2V, and MP789.

The similarities between the SARS-CoV-2 genome sequence and the five sequences were confirmed by applying SDFT and CWT techniques to EIIP signals. Simplot analysis revealed a similarity of the SARS-CoV-2 genome with pangolin CoV MP789 and bat CoV RaTG13.


To summarize, in the present work, the authors proposed original genomic processing and identification methods that help compare various coronavirus genomes and identify the similarities between the viruses like SARS-CoV-2 that affect humans and other viruses belonging to the same family that affects other species. According to the authors, such investigations are crucial to understanding the origin as well as the evolution of the SARS-CoV-2 genome.

The researchers compared the results obtained using different DNA representation methods to each other and to results obtained by traditional methods such as Simplot analysis and Blast comparison, in an attempt to identify any possible recombination events.

The new algorithms proposed by the authors are based on nucleotide frequencies and could help classify and identify many other DNA sequences by analyzing the correlation spectra between the sequences. The authors are hopeful that these findings on phylogenetic trees using the numerical DNA sequence-based classification are a reflection of the algorithm performance and their methods can help solve complex biological problems.

*Important notice

Research Square publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as conclusive, guide clinical practice/health-related behavior, or treated as established information.

Journal reference:
  • Touati R, Touati M, Benzarti F, Kumar V, Kharrat M, Elngar AA. (2022). Different Genomic Representations of Novel Pathogens Base on Signal Processing Algorithms: COVID-19 Case Study. Research Square. doi: 10.21203/

Posted in: Medical Science News | Medical Research News | Disease/Infection News

Tags: Coronavirus, Coronavirus Disease COVID-19, covid-19, DNA, Electron, Evolution, Genome, Genomic, Homologous, Homologous Recombination, Ion, Nucleotide, Pandemic, Protein, Research, Respiratory, Ribonucleic Acid, RNA, SARS, SARS-CoV-2, Severe Acute Respiratory, Syndrome

Comments (0)

Written by

Tarun Sai Lomte

Tarun is a writer based in Hyderabad, India. He has a Master’s degree in Biotechnology from the University of Hyderabad and is enthusiastic about scientific research. He enjoys reading research papers and literature reviews and is passionate about writing.

Source: Read Full Article