Sequence Map Vignette

Introduction

One way to visualize results both within idpr and with data from other sources is the sequenceMap() function. The purpose of this function is to show the entire sequence and color residues based on properties. This may help identify important residues within a protein.

It has been identified that as few as a single amino acid is sufficient for an IDP to dock to its binding partner (Warren & Shechter, 2017). Therefore, it is important to look at sequence-wide data along with the context of the primary sequence.

To make a sequence map, two vectors of data are needed (typically in the form of a data frame). One vector is the amino acid sequence and another is a vector of data to color the residues based on some property.

Examples will use the H. sapiens TP53 sequence, acquired from UniProt (UniProt Consortium 2019) and stored within the idpr package for examples.

Installation

The package can be installed from Bioconductor with the following line of code. It requires the BiocManager package to be installed.

#BiocManager::install("idpr")

The most recent version of the package can be installed with the following line of code. This requires the devtools package to be installed.

#devtools::install_github("wmm27/idpr")

Basics of sequenceMap

library(idpr) #load the idpr package
P53_HUMAN <- TP53Sequences[2]
print(P53_HUMAN)
#>                                                                                                                                                                                                                                                                                                                                                                                            P04637|P53_HUMAN 
#> "MEEPQSDPSVEPPLSQETFSDLWKLLPENNVLSPLPSQAMDDLMLSPDDIEQWFTEDPGPDEAPRMPEAAPPVAPAPAAPTPAAPAPAPSWPLSSSVPSQKTYQGSYGFRLGFLHSGTAKSVTCTYSPALNKMFCQLAKTCPVQLWVDSTPPPGTRVRAMAIYKQSQHMTEVVRRCPHHERCSDSDGLAPPQHLIRVEGNLRVEYLDDRNTFRHSVVVPYEPPEVGSDCTTIHYNYMCNSSCMGGMNRRPILTIITLEDSSGNLLGRNSFEVRVCACPGRDRRTEEENLRKKGEPHHELPPGSTKRALPNNTSSSPQPKKKPLDGEYFTLQIRGRERFEMFRELNEALELKDAQAGKEPGGSRAHSSHLKSKKGQSTSRHKKLMFKTEGPDSD"

The values can be discrete, like the output of structuralTendency(), or continuous, like the output of chargeCalculationGlobal()

tendencyDF <- structuralTendency(sequence = P53_HUMAN)
head(tendencyDF)
#>   Position AA           Tendency
#> 1        1  M    Order Promoting
#> 2        2  E Disorder Promoting
#> 3        3  E Disorder Promoting
#> 4        4  P Disorder Promoting
#> 5        5  Q Disorder Promoting
#> 6        6  S Disorder Promoting

chargeDF <- chargeCalculationGlobal(sequence = P53_HUMAN,
                                    includeTermini = FALSE)
head(chargeDF)
#>   Position AA     Charge
#> 1        1  M  0.0000000
#> 2        2  E -0.9974244
#> 3        3  E -0.9974244
#> 4        4  P  0.0000000
#> 5        5  Q  0.0000000
#> 6        6  S  0.0000000

The output of structuralTendency() returns both the amino acid sequence in the column ‘AA’ and the matched values in ‘Tendency’.

The output of chargeCalculationLocal(plotResults = FALSE) returns both the amino acid sequence in the column ‘AA’ and the calculated values in ‘Charge’.

sequenceMap(
  sequence = tendencyDF$AA,
  property = tendencyDF$Tendency)


sequenceMap(
  sequence = as.character(chargeDF$AA),
  property = chargeDF$Charge, #character vector
  customColors = c("blue", "red", "grey30"))

Customizations

There are multiple customization options to allow for improved graphing. One is the organization of the labels. You are able to represent the sequence with both amino acid residues and their location in the sequence, but you can choose one or the other (or neither). This is specified by the ‘labelType’ argument

sequenceMap(
  sequence = tendencyDF$AA,
  property = tendencyDF$Tendency,
  labelType = "AA") #Only AA residue Labels


sequenceMap(
  sequence = tendencyDF$AA,
  property = tendencyDF$Tendency,
  labelType = "number") #Only residue numner labels

There is also the option of where to plot the labels. either “on” or “below” the sequence graphic. This is done with the “labelLocation” argument.

sequenceMap(
  sequence = tendencyDF$AA,
  property = tendencyDF$Tendency,
  labelType = "number",
  labelLocation = "on") #Residue numbers printed on the sequence graphic



sequenceMap(
  sequence = tendencyDF$AA,
  property = tendencyDF$Tendency,
  labelType = "number",
  labelLocation = "below") #Residue numbers printed below the sequence graphic

The text can also be rotated, via the ‘rotationAngle’ argument, for ease of reading. This is especially helpful for larger sequences with dense graphics.

sequenceMap(
  sequence = tendencyDF$AA,
  property = tendencyDF$Tendency,
  labelType = "number",
  labelLocation = "on",
  rotationAngle = 90)

Also to avoid an overwhelming visual, you can specify the pattern of labels with everyN. This number selects every Nth residue to be printed. everyN = 1 every residue is printed. everyN = 10 every 10th residue is printed.

sequenceMap(
  sequence = tendencyDF$AA,
  property = tendencyDF$Tendency,
  labelType = "number",
  labelLocation = "on",
  everyN = 1) #Every residue


sequenceMap(
  sequence = tendencyDF$AA,
  property = tendencyDF$Tendency,
  labelType = "number",
  labelLocation = "on",
  everyN = 2) #Every 2nd (or every other)


sequenceMap(
  sequence = tendencyDF$AA,
  property = tendencyDF$Tendency,
  labelType = "number",
  labelLocation = "on",
  everyN = 10) #Every 10th residue is printed

You can also change the number of residues on each line with nbResidues. This helps improve visualization based on the sequence size. Visualizing a larger sequence will likely benefit from a larger nbResidues value.

sequenceMap(
  sequence = tendencyDF$AA,
  property = tendencyDF$Tendency,
  labelType = "number",
  labelLocation = "on",
  nbResidues = 15) #15 residues each row