Contents

1 Introduction

The ambitions of collaborative single cell biology will only be achieved through the coordinated efforts of many groups, to help clarify cell types and dynamics in an array of functional and environmental contexts. The use of formal ontology in this pursuit is well-motivated and research progress has already been substantial.

Bakken et al. (2017) discuss “strategies for standardized cell type representations based on the data outputs from [high-content flow cytometry and single cell RNA sequencing], including ‘context annotations’ in the form of standardized experiment metadata about the specimen source analyzed and marker genes that serve as the most useful features in machine learning-based cell type classification models.” Aevermann et al. (2018) describe how the FAIR principles can be implemented using statistical identification of necessary and sufficient conditions for determining cell class membership. They propose that Cell Ontology can be transformed to a broadly usable knowledgebase through the incorporation of accurate marker gene signatures for cell classes.

In this vignette, we review key concepts and tasks required to make progress in the adoption and application of ontological discipline in Bioconductor-oriented data analysis.

We’ll start by setting up some package attachments and ontology objects.

library(ontoProc)
library(ontologyPlot)
library(BiocStyle)  # for package references
cl = getOnto("cellOnto", "2021") # for continuity --    has_high_plasma_membrane_amount: list
go = getOnto("goOnto", "2021")  # if updated, some assertions will fail...
pr = getOnto("Pronto", "2021")  # important case change

2 Scope of package

2.1 OWL interface

As of 1.99.0, facilities are present to import any valid OWL ontology. We use basilisk to incorporate functionality from owlready2 and bioregisty. One way of identifying a large number of ontologies available for ingestion is to query bioregistry.

br = bioregistry_ols_resources()
## + /home/biocbuild/.cache/R/basilisk/1.19.0/0/bin/conda create --yes --prefix /home/biocbuild/.cache/R/basilisk/1.19.0/ontoProc/2.1.3/bsklenv 'python=3.10.14' --quiet -c conda-forge --override-channels
## + /home/biocbuild/.cache/R/basilisk/1.19.0/0/bin/conda install --yes --prefix /home/biocbuild/.cache/R/basilisk/1.19.0/ontoProc/2.1.3/bsklenv 'python=3.10.14' -c conda-forge --override-channels
## + /home/biocbuild/.cache/R/basilisk/1.19.0/0/bin/conda install --yes --prefix /home/biocbuild/.cache/R/basilisk/1.19.0/ontoProc/2.1.3/bsklenv -c conda-forge 'python=3.10.14' 'h5py=3.6.0' --override-channels
library(DT)
datatable(br[,c(2,3)])

We can use the URLs given in this table to explore ontologies of interest. For example, the AEO (anatomical entity ontology) extends CARO (the common anatomy reference ontology). What sorts of terms are regarded as extensions?

aeo = owl2cache(url="http://purl.obolibrary.org/obo/aeo.owl") # localize OWL
## resource BFC4477 already in cache from http://purl.obolibrary.org/obo/aeo.owl
aeoinr = setup_entities2(aeo)
set.seed(1234)
suppressWarnings({ # zero-length angle
onto_plot2(aeoinr, sample(grep("AEO", names(aeoinr$name), value=TRUE),12))
})