One way to visualize results both within idpr and with data from other sources is the sequenceMap() function. The purpose of this function is to show the entire sequence and color residues based on properties. This may help identify important residues within a protein.
It has been identified that as few as a single amino acid is sufficient for an IDP to dock to its binding partner (Warren & Shechter, 2017). Therefore, it is important to look at sequence-wide data along with the context of the primary sequence.
To make a sequence map, two vectors of data are needed (typically in the form of a data frame). One vector is the amino acid sequence and another is a vector of data to color the residues based on some property.
Examples will use the H. sapiens TP53 sequence, acquired from UniProt (UniProt Consortium 2019) and stored within the idpr package for examples.
The package can be installed from Bioconductor with the following line of code. It requires the BiocManager package to be installed.
The most recent version of the package can be installed with the following line of code. This requires the devtools package to be installed.
P53_HUMAN <- TP53Sequences[2]
print(P53_HUMAN)
#> P04637|P53_HUMAN
#> "MEEPQSDPSVEPPLSQETFSDLWKLLPENNVLSPLPSQAMDDLMLSPDDIEQWFTEDPGPDEAPRMPEAAPPVAPAPAAPTPAAPAPAPSWPLSSSVPSQKTYQGSYGFRLGFLHSGTAKSVTCTYSPALNKMFCQLAKTCPVQLWVDSTPPPGTRVRAMAIYKQSQHMTEVVRRCPHHERCSDSDGLAPPQHLIRVEGNLRVEYLDDRNTFRHSVVVPYEPPEVGSDCTTIHYNYMCNSSCMGGMNRRPILTIITLEDSSGNLLGRNSFEVRVCACPGRDRRTEEENLRKKGEPHHELPPGSTKRALPNNTSSSPQPKKKPLDGEYFTLQIRGRERFEMFRELNEALELKDAQAGKEPGGSRAHSSHLKSKKGQSTSRHKKLMFKTEGPDSD"
The values can be discrete, like the output of structuralTendency(), or continuous, like the output of chargeCalculationGlobal()
tendencyDF <- structuralTendency(sequence = P53_HUMAN)
head(tendencyDF)
#> Position AA Tendency
#> 1 1 M Order Promoting
#> 2 2 E Disorder Promoting
#> 3 3 E Disorder Promoting
#> 4 4 P Disorder Promoting
#> 5 5 Q Disorder Promoting
#> 6 6 S Disorder Promoting
chargeDF <- chargeCalculationGlobal(sequence = P53_HUMAN,
includeTermini = FALSE)
head(chargeDF)
#> Position AA Charge
#> 1 1 M 0.0000000
#> 2 2 E -0.9974244
#> 3 3 E -0.9974244
#> 4 4 P 0.0000000
#> 5 5 Q 0.0000000
#> 6 6 S 0.0000000
The output of structuralTendency() returns both the amino acid sequence in the column ‘AA’ and the matched values in ‘Tendency’.
The output of chargeCalculationLocal(plotResults = FALSE) returns both the amino acid sequence in the column ‘AA’ and the calculated values in ‘Charge’.
There are multiple customization options to allow for improved graphing. One is the organization of the labels. You are able to represent the sequence with both amino acid residues and their location in the sequence, but you can choose one or the other (or neither). This is specified by the ‘labelType’ argument
sequenceMap(
sequence = tendencyDF$AA,
property = tendencyDF$Tendency,
labelType = "AA") #Only AA residue Labels
sequenceMap(
sequence = tendencyDF$AA,
property = tendencyDF$Tendency,
labelType = "number") #Only residue numner labels
There is also the option of where to plot the labels. either “on” or “below” the sequence graphic. This is done with the “labelLocation” argument.
sequenceMap(
sequence = tendencyDF$AA,
property = tendencyDF$Tendency,
labelType = "number",
labelLocation = "on") #Residue numbers printed on the sequence graphic
sequenceMap(
sequence = tendencyDF$AA,
property = tendencyDF$Tendency,
labelType = "number",
labelLocation = "below") #Residue numbers printed below the sequence graphic
The text can also be rotated, via the ‘rotationAngle’ argument, for ease of reading. This is especially helpful for larger sequences with dense graphics.
sequenceMap(
sequence = tendencyDF$AA,
property = tendencyDF$Tendency,
labelType = "number",
labelLocation = "on",
rotationAngle = 90)
Also to avoid an overwhelming visual, you can specify the pattern of labels with everyN. This number selects every Nth residue to be printed. everyN = 1 every residue is printed. everyN = 10 every 10th residue is printed.
sequenceMap(
sequence = tendencyDF$AA,
property = tendencyDF$Tendency,
labelType = "number",
labelLocation = "on",
everyN = 1) #Every residue
sequenceMap(
sequence = tendencyDF$AA,
property = tendencyDF$Tendency,
labelType = "number",
labelLocation = "on",
everyN = 2) #Every 2nd (or every other)
sequenceMap(
sequence = tendencyDF$AA,
property = tendencyDF$Tendency,
labelType = "number",
labelLocation = "on",
everyN = 10) #Every 10th residue is printed
You can also change the number of residues on each line with nbResidues. This helps improve visualization based on the sequence size. Visualizing a larger sequence will likely benefit from a larger nbResidues value.