A viral quasispecies is understood as a collection of closely related viral genomes produced by viruses with low replication fidelity. RNA viruses show a high replication error rate due to a lack of proofreading mechanisms. It is estimated that for viruses with typically high replicative loads, every possible point mutation and many double mutations are generated with each viral replication cycle, and these may be present within the population at any time. Given this inherent dynamic, we may be interested in comparing the viral diversity indices between sequential samples from a single patient or between samples from groups of patients. These comparisons can provide information on the patient’s clinical progression or the appropriateness of a given treatment.
QSUtils is a package intended for use with quasispecies amplicon data obtained by NGS, but it could also be useful for analyzing 16S/18S ribosomal-based metagenomics or tumor genetic diversity by amplicons.
In this tutorial, we illustrate how the functions provided in the package can be used to simulate quasispecies data. This implies simulation of closely related genomes, (eventually with segregating subpopulations at higher genetic distances) and their abundances.
In particular, we can differentiate between acute and chronic infection profiles by the quasispecies composition. An acute infection is expected to show a prominent genome that is highly abundant, together with a set of low-abundance genomes. On the other hand, besides the implicit dynamics, a chronic infection is expected to show a number of relatively abundant genomes together with a myriad of derived genomes at low and very low abundances.
In viral terms, the fitness of a genome is a measure of its replicative performance. High-fitness haplotypes show high abundances after a transient state, during which they overcome other genomes in the quasispecies. During infection of a human host, the quasispecies typically shows variations in the fitness of each genome caused by changes in the bioenvironment. Because of this dynamic, we may observe the profile of a typical acute infection also in chronic patients, at least at the level of magnification provided by current NGS technology.
A few functions in the package were designed to simulate quasispecies composition, with the aim of studying the statistical properties of the diversity indices (Gregori et al. 2016) (Gregori et al. 2014).
BiocManager::install("QSutils")
## Warning: package(s) not installed when version(s) same as or greater than current; use
## `force = TRUE` to re-install: 'QSutils'
library(QSutils)
Two different types of information define the quasispecies composition: the genomes present and their current frequency (abundance) in the viral population. The package provides three ways to simulate abundance with various decreasing profiles.
\(fn.ab\) setting the argument \(fn\) to \(pf\) computes consecutive fractions, given the frequency of the most abundant haplotype, according to:
\[h ~r^{(i-1)}, ~~ r<1, ~~ i=1..n\] Using \(table\) on the \(fn.ab\) result, we obtain the number of haplotypes for each frequency:
table(fn.ab(25,fn="pf"))
##
## 1 2 4 9 19 39 78 156 312 625 1250 2500 5000
## 12 1 1 1 1 1 1 1 1 1 1 1 1
## 10000
## 1
par(mfrow=c(1,2))
plot(fn.ab(25,fn="pf"),type="h")
plot(fn.ab(25,r=0.7,fn="pf"),type="h")
The default value for \(r\) is 0.5. Higher \(r\) values moderate the decrease of abundances. By default, the frequency of the most abundant haplotype, \(h\), is 10000.
\(fn.ab\) with the argument \(fn="pcf"\) computes the power of consecutive fractions, according to:
\[h ~ \frac{1}{i^{r}}, ~~ r>0, ~~ i=1..n\] Default values are 0.5 for \(r\) and 10,000 for \(h\). The higher the \(r\), the more pronounced the decrease in frequencies.
table(fn.ab(25,r=3,fn="pcf"))
##
## 1 2 3 4 5 7 10 13 19 29 46 80 156
## 8 3 1 1 1 1 1 1 1 1 1 1 1
## 370 1250 10000
## 1 1 1
table(fn.ab(25,r=2,fn="pcf"))
##
## 16 17 18 20 22 25 27 30 34 39 44 51 59
## 1 1 1 1 1 1 1 1 1 1 1 1 1
## 69 82 100 123 156 204 277 400 625 1111 2500 10000
## 1 1 1 1 1 1 1 1 1 1 1 1
par(mfrow=c(1,2))
plot(fn.ab(25,r=3,fn="pcf"),type="h")
plot(fn.ab(25,r=2,fn="pcf"),type="h")