VectraPolarisData

Julia Wrobel and Tusharkanti Ghosh1

1Department of Biostatistics and Informatics, Colorado School of Public Health

17 April 2025

Abstract

The VectraPolarisData ExperimentHub package provides two large multiplex immunofluorescence datasets collected by Akoya Biosciences Vectra 3 and Vectra Polaris platforms. Image preprocessing (cell segmentation and phenotyping) was performed using Inform software. Data cover are formatted into objects of class SpatialExperiment.

Package

VectraPolarisData 1.12.0

1 Loading the data

To retrieve a dataset, we can use a dataset’s corresponding named function <id>(), where <id> should correspond to one a valid dataset identifier (see ?VectraPolarisData). Below both the lung and ovarian cancer datasets are loaded this way.

library(VectraPolarisData)
spe_lung <- HumanLungCancerV3()
spe_ovarian <- HumanOvarianCancerVP()

Alternatively, data can loaded directly from Bioconductor’s ExperimentHub as follows. First, we initialize a hub instance and store the complete list of records in a variable eh. Using query(), we then identify any records made available by the VectraPolarisData package, as well as their accession IDs (EH7311 for the lung cancer data). Finally, we can load the data into R via eh[[id]], where id corresponds to the data entry’s identifier we’d like to load. E.g.:

library(ExperimentHub)
eh <- ExperimentHub()        # initialize hub instance
q <- query(eh, "VectraPolarisData") # retrieve 'VectraPolarisData' records
id <- q$ah_id[1]             # specify dataset ID to load
spe <- eh[[id]]              # load specified dataset

2 Data Representation

Both the HumanLungCancerV3() and HumanOvarianCancerVP() datasets are stored as SpatialExperiment objects. This allows users of our data to interact with methods built for SingleCellExperiment, SummarizedExperiment, and SpatialExperiment class methods in Bioconductor. See this ebook for more details on SpatialExperiment. To get cell level tabular data that can be stored in this format, raw multiplex.tiff images have been preprocessed, segmented and cell phenotyped using Inform software from Akoya Biosciences.

The SpatialExperiment class was originally built for spatial transcriptomics data and follows the structure depicted in the schematic below (Righelli et al. 2021):

To adapt this class structure for multiplex imaging data we use slots in the following way:

assays slot: intensities, nucleus_intensities, membrane_intensities
sample_id slot: contains image identifier. For the VectraOvarianDataVP this also identifies the subject because there is one image per subject
colData slot: Other cell-level characteristics of the marker intensities, cell phenotypes, cell shape characteristics
spatialCoordsNames slot: The x- and y- coordinates describing the location of the center point in the image for each cell
metadata slot: A dataframe of subject-level patient clinical characteristics.

3 Transforming to other data formats

The following code shows how to transform the SpatialExperiment class object to a data.frame class object, if that is preferred for analysis. The example below is shown using the HumanOvarianVP dataset.

library(dplyr)

## Assays slots
assays_slot <- assays(spe_ovarian)
intensities_df <- assays_slot$intensities
rownames(intensities_df) <- paste0("total_", rownames(intensities_df))
nucleus_intensities_df<- assays_slot$nucleus_intensities
rownames(nucleus_intensities_df) <- paste0("nucleus_", rownames(nucleus_intensities_df))
membrane_intensities_df<- assays_slot$membrane_intensities
rownames(membrane_intensities_df) <- paste0("membrane_", rownames(membrane_intensities_df))

# colData and spatialData
colData_df <- colData(spe_ovarian)
spatialCoords_df <- spatialCoords(spe_ovarian)

# clinical data
patient_level_df <- metadata(spe_ovarian)$clinical_data

cell_level_df <- as.data.frame(cbind(colData_df, 
                                     spatialCoords_df,
                                     t(intensities_df),
                                     t(nucleus_intensities_df),
                                     t(membrane_intensities_df))
                               )


ovarian_df <- full_join(patient_level_df, cell_level_df, by = "sample_id")

4 Citation Info

The objects provided in this package are rich data sources we encourage others to use in their own analyses. If you do include them in your peer-reviewed work, we ask that you cite our package and the original studies.

To cite the VectraPolarisData package, use:

@Manual{VectraPolarisData,
    title = {VectraPolarisData: Vectra Polaris and Vectra 3 multiplex single-cell imaging data},
    author = {Wrobel, J and Ghosh, T},
    year = {2022},
    note = {Bioconductor R package version 1.0},
  }

To cite the HumanLungCancerV3() data in bibtex format, use:

@article{johnson2021cancer,
  title={Cancer cell-specific MHCII expression as a determinant of the immune infiltrate organization and function in the non-small cell lung cancer tumor microenvironment.},
  author={Johnson, AM and Boland, JM and Wrobel, J and Klezcko, EK and Weiser-Evans, M and Hopp, K and Heasley, L and Clambey, ET and Jordan, K and Nemenoff, RA and others},
  journal={Journal of Thoracic Oncology: Official Publication of the International Association for the Study of Lung Cancer},
  year={2021}
}

To cite the HumanOvarianCancerVP() data, use:

@article{steinhart2021spatial,
  title={The spatial context of tumor-infiltrating immune cells associates with improved ovarian cancer survival},
  author={Steinhart, Benjamin and Jordan, Kimberly R and Bapat, Jaidev and Post, Miriam D and Brubaker, Lindsay W and Bitler, Benjamin G and Wrobel, Julia},
  journal={Molecular Cancer Research},
  volume={19},
  number={12},
  pages={1973--1979},
  year={2021},
  publisher={AACR}
}

5 Data Dictionaries

Detailed tables representing what is provided in each dataset are listed here

5.1 HumanLungCancerV3

In the table below note the following shorthand:

[marker] represents one of: cd3, cd8, cd14, cd19, cd68, ck, dapi, hladr,
[cell region] represents one of: entire_cell, membrane, nucleus

Table 1: data dictionary for HumanLungCancerV3

Variable	Slot	Description	Variable coding
[marker]	assays: intensities	mean total cell intensity for [marker]
[marker]	assays: nucleus_intensities	mean nucleus intensity for [marker]
[marker]	assays: membrane_intensities	mean membrane intensity for [marker]
sample_id		image identifier, also subject id for the ovarian data
cell_id	colData	cell identifier
slide_id		slide identifier, also the patient id for the lung data
tissue category		type of tissue (indicates a region of the image)	Stroma or Tumor
[cell region]_[marker]_min		min [cell region] intensity for [marker]
[cell region]_[marker]_max		max [cell region] intensity for [marker]
[cell region]_[marker]_std_dev		[cell region] std dev of intensity for [marker]
[cell region]_[marker]_total		total [cell region] intensity for [marker]
[cell region]_area_square_microns		[cell region] area in square microns
[cell region]_compactness		[cell region] compactness
[cell region]_minor_axis		[cell region] length of minor axis
[cell region]_major_axis		[cell region] length of major axis
[cell region]_axis_ratio		[cell region] ratio of major and minor axis
phenotype_[marker]		cell phenotype label as determined by Inform software
cell_x_position	spatialCoordsNames	cell x coordinate
cell_y_position	spatialCoordsNames	cell y coordinate
gender	metadata	gender	“M”, “F”
mhcII_status		MHCII status, from Johnson et.al. 2021	“low”, “high”
age_at_diagnosis		age at diagnosis
stage_at_diagnosis		stage of the cancer when image was collected
stage_numeric		numeric version of stage variable
pack_years		pack-years of cigarette smoking
survival_days		time in days from date of diagnosis to date of death or censoring event
survival_status		did the participant pass away?	0 = no, 1 = yes
cause_of_death		cause of death
recurrence_or_lung_ca_death		did the participant have a recurrence or death event?	0 = no, 1 = yes
time_to_recurrence_days		time in days from date of diagnosis to first recurrent event
adjuvant_therapy		whether or not the participant received adjuvant therapy	“No”, “Yes”

5.2 HumanOvarianCancerVP

In the table below note the following shorthand:

[marker] represents one of: cd3, cd8, cd19, cd68, ck, dapi, ier3, ki67, pstat3
[cell region] represents one of: cytoplasm, membrane, nucleus

Table 2: data dictionary for HumanOvarianCancerVP

Variable	Slot	Description	Variable coding
[marker]	assays: intensities	mean total cell intensity for [marker]
[marker]	assays: nucleus_intensities	mean nucleus intensity for [marker]
[marker]	assays: membrane_intensities	mean membrane intensity for [marker]
sample_id		image identifier, also subject id for the ovarian data
cell_id	colData	cell identifier
slide_id		slide identifier
tissue category		type of tissue (indicates a region of the image)	Stroma or Tumor
[cell region]_[marker]_min		min [cell region] intensity for [marker]
[cell region]_[marker]_max		max [cell region] intensity for [marker]
[cell region]_[marker]_std_dev		[cell region] std dev of intensity for [marker]
[cell region]_[marker]_total		total [cell region] intensity for [marker]
[cell region]_area_square_microns		[cell region] area in square microns
[cell region]_compactness		[cell region] compactness
[cell region]_minor_axis		[cell region] length of minor axis
[cell region]_major_axis		[cell region] length of major axis
[cell region]_axis_ratio		[cell region] ratio of major and minor axis
cell_x_position	spatialCoordsNames	cell x coordinate
cell_y_position	spatialCoordsNames	cell y coordinate
diagnosis	metadata
primary		primary tumor from initial diagnosis?	0 = no, 1 = yes
recurrent		tumor from a recurrent event (not initial diagnosis tumor)?	0 = no, 1 = yes
treatment_effect		was tumor treated with chemo prior to imaging?	0 = no, 1 = yes
stage		stage of the cancer when image was collected	I,II,II,IV
grade		grade of cancer severity (nearly all 3)
survival_time		time in months from date of diagnosis to date of death or censoring event
death		did the participant pass away?	0 = no, 1 = yes
BRCA_mutation		does the participant have a BRCA mutation?	0 = no, 1 = yes
age_at_diagnosis		age at diagnosis
time_to_recurrence		time in months from date of diagnosis to first recurrent event
parpi_inhibitor		whether or not the participant received PARPi inhibitor	N = no, Y = yes
debulking		subjective rating of how the tumor removal process went	optimal, suboptimal, interval

Note: the debulking variable described as optimal if surgeon believes tumor area was reduced to 1 cm or below; suboptimal if surgeon was unable to remove significant amount of tumor due to various reasons; interval if tumor removal came after three cycles of chemo

6 Session Info

sessionInfo()
#> R version 4.5.0 RC (2025-04-04 r88126)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.2 LTS
#> 
#> Matrix products: default
#> BLAS:   /home/biocbuild/bbs-3.21-bioc/R/lib/libRblas.so 
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0  LAPACK version 3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_GB              LC_COLLATE=C              
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: America/New_York
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats4    stats     graphics  grDevices utils     datasets  methods  
#> [8] base     
#> 
#> other attached packages:
#>  [1] dplyr_1.1.4                 VectraPolarisData_1.12.0   
#>  [3] SpatialExperiment_1.18.0    SingleCellExperiment_1.30.0
#>  [5] SummarizedExperiment_1.38.0 Biobase_2.68.0             
#>  [7] GenomicRanges_1.60.0        GenomeInfoDb_1.44.0        
#>  [9] IRanges_2.42.0              S4Vectors_0.46.0           
#> [11] MatrixGenerics_1.20.0       matrixStats_1.5.0          
#> [13] ExperimentHub_2.16.0        AnnotationHub_3.16.0       
#> [15] BiocFileCache_2.16.0        dbplyr_2.5.0               
#> [17] BiocGenerics_0.54.0         generics_0.1.3             
#> [19] BiocStyle_2.36.0           
#> 
#> loaded via a namespace (and not attached):
#>  [1] KEGGREST_1.48.0         rjson_0.2.23            xfun_0.52              
#>  [4] bslib_0.9.0             lattice_0.22-7          vctrs_0.6.5            
#>  [7] tools_4.5.0             curl_6.2.2              tibble_3.2.1           
#> [10] AnnotationDbi_1.70.0    RSQLite_2.3.9           blob_1.2.4             
#> [13] pkgconfig_2.0.3         Matrix_1.7-3            lifecycle_1.0.4        
#> [16] GenomeInfoDbData_1.2.14 compiler_4.5.0          Biostrings_2.76.0      
#> [19] htmltools_0.5.8.1       sass_0.4.10             yaml_2.3.10            
#> [22] pillar_1.10.2           crayon_1.5.3            jquerylib_0.1.4        
#> [25] DelayedArray_0.34.0     cachem_1.1.0            magick_2.8.6           
#> [28] abind_1.4-8             mime_0.13               tidyselect_1.2.1       
#> [31] digest_0.6.37           purrr_1.0.4             bookdown_0.43          
#> [34] BiocVersion_3.21.1      grid_4.5.0              fastmap_1.2.0          
#> [37] SparseArray_1.8.0       cli_3.6.4               magrittr_2.0.3         
#> [40] S4Arrays_1.8.0          withr_3.0.2             filelock_1.0.3         
#> [43] UCSC.utils_1.4.0        rappdirs_0.3.3          bit64_4.6.0-1          
#> [46] rmarkdown_2.29          XVector_0.48.0          httr_1.4.7             
#> [49] bit_4.6.0               png_0.1-8               memoise_2.0.1          
#> [52] evaluate_1.0.3          knitr_1.50              rlang_1.1.6            
#> [55] Rcpp_1.0.14             glue_1.8.0              DBI_1.2.3              
#> [58] BiocManager_1.30.25     jsonlite_2.0.0          R6_2.6.1