1 Introduction

Single-cell RNA sequencing (scRNA-seq) enables expression quantitative trait locus (eQTL) analysis at cellular resolution, offering new opportunities to uncover regulatory variants with cell-type-specific effects. However, existing tools are often limited in functionality, input compatibility, or scalability for sparse single-cell data. To address these challenges, we developed scQTLtools, a comprehensive R/Bioconductor package that facilitates end-to-end single-cell eQTL analysis, from preprocessing to visualization. The toolkit supports flexible input formats, including Seurat and SingleCellExperiment objects, handles both binary and three-class genotype encodings, and provides dedicated functions for gene expression normalization, SNP and gene filtering, eQTL mapping, and versatile result visualization. To accommodate diverse data characteristics, scQTLtools implements three statistical models—linear regression, Poisson regression, and zero-inflated negative binomial regression. We applied scQTLtools to scRNA-seq data from human acute myeloid leukemia and identified eQTLs with regulatory effects that varied across cell types. Visualization of SNP–gene pairs revealed both positive and negative associations between genotype and gene expression. These results demonstrate the ability of scQTLtools to uncover cell-type-specific regulatory variation that is often missed by bulk eQTL analyses. Overall, scQTLtools offers a robust, flexible, and user-friendly framework for dissecting genotype-expression relationships in heterogeneous cellular populations.

1.1 Rationale for Bioconductor Submission

By seeking inclusion in Bioconductor, we aim to integrate scQTLtools into a well-established ecosystem that is widely used by researchers in bioinformatics. Bioconductor’s rigorous standards for package quality and its focus on reproducibility will enhance the credibility of scQTLtools. Additionally, being part of Bioconductor will provide access to a broader user base and foster collaboration with other developers, contributing to the ongoing improvement and validation of the package.

2 Installation

if (!require("BiocManager"))
    install.packages("BiocManager")
BiocManager::install("scQTLtools")

3 Overview of the package

The functions in scQTLtools can be categorized into data input, data pre-process, sc-eQTL calling and visualization modules. The functions and their brief descriptions are summarized below.

3.1 General Workflow

Each module is summarized as shown below.

scQTLtools requires two key input data: a single-cell gene expression dataset and a corresponding SNP genotype matrix. The single-cell gene expression dataset can be either a gene expression matrix or an object such as a Seurat v4 object or a Bioconductor SingleCellExperiment object. The input genotype matrix should follow a 0/1/2/3 encoding scheme: 1 for homozygous reference genotype, 2 for homozygous alternative genotype, 3 for heterozygous genotype, and 0 for missing values. Moreover, scQTLtools can support a simplified 0/1/2 encoding scheme, where 2 denotes a non-reference genotype. Additionally, the package includes functionality to normalize the raw single-cell gene expression matrix, and filter SNP–gene pairs. After pre-process, scQTLtools implements the callQTL() function to identify sc-eQTLs. Moreover, visualization at the single-cell level demonstrates the specificity of eQTLs across distinct cell types or cellular states.

3.2 Comparison and advantages compared to similar works

We compared scQTLtools to other packages with similar functionality, including eQTLsingle, SCeQTL, Matrix eQTL, and iBMQ, as shown in the table below.

Among these tools, scQTLtools stands out for its comprehensive features:

  1. scQTLtools accepts SingleCellExperiment object and Seurat object as input data formats, which are particularly beneficial for users working with single-cell RNA-seq data, and promote the interoperability with the current Bioconductor ecosystem.

  2. scQTLtools supports both binary and three-class genotype encodings, enhancing its applicability across different genetic studies.

  3. scQTLtools offers extensive data pre-processing capabilities, including quality control filtering for SNPs and genes, and normalization of raw gene expression matrix. Users can also customize the genomic window for defining SNP–gene pairs, enabling flexible cis-eQTL discovery.

  4. scQTLtools provides three commonly used statistical models, which cater to various data distributions and analysis needs. This diversity allows users to select the most appropriate model for their specific dataset.

  5. scQTLtools includes a wide range of visualization tools, which facilitate detailed exploration and interpretation of eQTL results at the single-cell level.

  6. scQTLtools supports grouping by both cell type and cell state, which is crucial for analyzing the nuanced effects of genetic variants on gene expression within heterogeneous cell populations.

Overall, scQTLtools offers a comprehensive suite of features that enhance the analysis and interpretation of eQTLs.

4 Required input files

The input file requires a single-cell gene expression dataset and a corresponding SNP genotype matrix. The single-cell gene expression dataset can be either a gene expression matrix or an object such as a Seurat v4 object or a Bioconductor SingleCellExperiment object.

  • gene expression matrix: describes gene expressions, each row represents a gene and each column represents a cell.
  • Seurat object/SingleCellExperiment object: a Seurat/SingleCellExperiment object.
  • SNP genotype matrix: A matrix where each row represents a variant and each column a cell, using a 0/1/2/3 encoding scheme.

The columns (cells) of the genotype matrix should correspond to the columns (cells) of the gene expression matrix.

Example

library(scQTLtools)
# gene expression matrix
data(GeneData)
# SeuratObject
data(Seurat_obj)
# load the genotype data
data(SNPData)
data(SNPData2)

5 Create eQTL object

The createQTLObject class is an R object designed to store data related to eQTL analysis, encompassing data lists, result data frames, and slots for biClassify, species, and group information.

Example

eqtl_matrix <- createQTLObject(
    snpMatrix = SNPData,
    genedata = GeneData,
    biClassify = FALSE,
    species = 'human',
    group = NULL)

Users can set biClassify to TRUE to change the genotype encoding mode.

Example

eqtl_matrix_bi <- createQTLObject(
    snpMatrix = SNPData,
    genedata = GeneData,
    biClassify = TRUE,
    species = 'human',
    group = NULL)

Users can use Seurat object instead of gene expression matrix.

Example

eqtl_seurat <- createQTLObject(
    snpMatrix = SNPData2,
    genedata = Seurat_obj,
    biClassify = FALSE,
    species = 'human',
    group = "celltype")

Users can also take SingleCellExperiment object as input.

Example

# Create a SingleCellExperiment object
library(SingleCellExperiment)
sce <- SingleCellExperiment(assays = list(counts = GeneData))
eqtl_sce <- createQTLObject(
    snpMatrix = SNPData,
    genedata = sce,
    biClassify = FALSE,
    species = 'human',
    group = NULL)

6 Normalize the raw gene expression matrix

Use normalizeGene() to normalize the raw gene expression matrix.

Example

eqtl_matrix  <- normalizeGene(
    eQTLObject = eqtl_matrix, 
    method = "logNormalize")
eqtl_sce  <- normalizeGene(
    eQTLObject = eqtl_sce, 
    method = "logNormalize")

7 Identify the valid SNP–gene pairs

Here we use filterGeneSNP() to filter SNP–gene pairs.

Example

eqtl_matrix <- filterGeneSNP(
    eQTLObject = eqtl_matrix,
    snpNumOfCellsPercent = 2,
    expressionMin = 0,
    expressionNumOfCellsPercent = 2)
eqtl_seurat <- filterGeneSNP(
    eQTLObject = eqtl_seurat,
    snpNumOfCellsPercent = 2,
    expressionMin = 0,
    expressionNumOfCellsPercent = 2)
eqtl_sce <- filterGeneSNP(
    eQTLObject = eqtl_sce,
    snpNumOfCellsPercent = 2,
    expressionMin = 0,
    expressionNumOfCellsPercent = 2)

8 Identify single-cell eQTLs

Here we use callQTL() to perform single cell eQTL analysis.

Example

eqtl1_matrix <- callQTL(
    eQTLObject = eqtl_matrix,
    gene_ids = NULL,
    downstream = NULL,
    upstream = NULL,
    gene_mart = NULL,
    snp_mart = NULL,
    pAdjustMethod = "bonferroni",
    useModel = "linear",
    pAdjustThreshold = 0.05,
    logfcThreshold = 0.1)
eqtl1_seurat <- callQTL(
    eQTLObject = eqtl_seurat,
    gene_ids = NULL,
    downstream = NULL,
    upstream = NULL,
    gene_mart = NULL,
    snp_mart = NULL,
    pAdjustMethod = "bonferroni",
    useModel = "linear",
    pAdjustThreshold = 0.05,
    logfcThreshold = 0.025)
eqtl1_sce <- callQTL(
    eQTLObject = eqtl_sce,
    gene_ids = NULL,
    downstream = NULL,
    upstream = NULL,
    gene_mart = NULL,
    snp_mart = NULL,
    pAdjustMethod = "bonferroni",
    useModel = "linear",
    pAdjustThreshold = 0.05,
    logfcThreshold = 0.025)

Users can use the parameter gene_ids to select one or several genes of interest for identifying sc-eQTLs.

Example

eqtl2_matrix <- callQTL(
    eQTLObject = eqtl_matrix,
    gene_ids = c("CNN2", 
                "RNF113A", 
                "SH3GL1", 
                "INTS13", 
                "PLAU"),
    downstream = NULL,
    upstream = NULL,
    gene_mart = NULL,
    snp_mart = NULL,
    pAdjustMethod = "bonferroni",
    useModel = "poisson",
    pAdjustThreshold = 0.05,
    logfcThreshold = 0.1)

Users can also set upstream and downstream to specify SNPs proximal to the gene in the genome.

Example

eqtl3_matrix <- callQTL(
    eQTLObject = eqtl_matrix,
    gene_ids = NULL,
    downstream = -9e7,
    upstream = 2e8,
    gene_mart = NULL,
    snp_mart = NULL,
    pAdjustMethod = "bonferroni",
    useModel = "poisson",
    pAdjustThreshold = 0.05,
    logfcThreshold = 0.05)

9 Visualize the results.

Use visualizeQTL() to visualize sc-eQTL results. Four plot types are supported: “histplot”, “violin”, “boxplot”, or “QTLplot”.

Example

violin plot:

visualizeQTL(
    eQTLObject = eqtl1_matrix,
    SNPid = "1:632647",
    Geneid = "RPS27",
    groupName = NULL,
    plottype = "QTLplot",
    removeoutlier = TRUE)

QTLplot:

By incorporating cell annotation from Seurat or SingleCellExperiment objects, scQTLtools can reveal cell-type-specific patterns of eQTL effects.

visualizeQTL(
    eQTLObject = eqtl1_seurat,
    SNPid = "1:632647",
    Geneid = "RPS27",
    groupName = NULL,
    plottype = "QTLplot",
    removeoutlier = TRUE)

In addition, the parameter groupName is used to specify a particular single-cell group of interest.

visualizeQTL(
    eQTLObject = eqtl1_seurat,
    SNPid = "1:632647",
    Geneid = "RPS27",
    groupName = "GMP",
    plottype = "QTLplot",
    removeoutlier = TRUE)

boxplot:

visualizeQTL(
    eQTLObject = eqtl1_sce,
    SNPid = "1:632647",
    Geneid = "RPS27",
    groupName = NULL,
    plottype = "boxplot",
    removeoutlier = TRUE)

10 References

Appendix

  1. Miao, Z., Deng, K.e., Wang, X., Zhang, X., Berger, B., 2018. DEsingle for detecting three types of differential expression in single-cell RNA-seq data. Bioinformatics 34 (18), 3223–3224. https://doi.org/10.1093/bioinformatics/bty332
  2. Hu, Y., Xi, X., Yang, Q., Zhang, X., 2020. SCeQTL: An R package for identifying eQTL from single-cell parallel sequencing data. BMC Bioinformatics 21, 1–12. https://doi.org/10.1186/s12859-020-3534-6
  3. Nathan, A., Asgari, S., Ishigaki, K. et al. Single-cell eQTL models reveal dynamic T cell state dependence of disease loci. Nature 606, 120–128 (2022). https://doi.org/10.1038/s41586-022-04713-1
  4. Ma, T. , Li, H. , & Zhang, X. . (2021). Discovering single-cell eQTLs from scRNA-seq data only. Cold Spring Harbor Laboratory. https://doi.org/10.1016/j.gene.2022.146520
  5. Schmiedel, B. J., Gonzalez-Colin, C., Fajardo, V., Rocha, J., Madrigal, A., Ramírez-Suástegui, C., Bhattacharyya, S., Simon, H., Greenbaum, J. A., Peters, B., Seumois, G., Ay, F., Chandra, V., & Vijayanand, P. (2022). Single-cell eQTL analysis of activated T cell subsets reveals activation and cell type-dependent effects of disease-risk variants. Science immunology, 7(68), eabm2508. https://doi.org/10.1126/sciimmunol.abm2508
  6. Imholte, G. C. , Marie-Pier, S. B. , Labbe Aurélie,
    Deschepper, C. F. , & Raphael, G. . (2013). Ibmq: a r/bioconductor package for integrated bayesian modeling of eqtl data. Bioinformatics(21), 2797-2798. https://doi.org/10.1093/bioinformatics/btt485
  7. Shabalin A. A. (2012). Matrix eQTL: ultra fast eQTL analysis via large matrix operations. Bioinformatics (Oxford, England), 28(10), 1353–1358. https://doi.org/10.1093/bioinformatics/bts163

A Session Info

sessionInfo()
## R version 4.5.0 (2025-04-11)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.2 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.22-bioc/R/lib/libRblas.so 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0  LAPACK version 3.12.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB              LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: America/New_York
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] SingleCellExperiment_1.31.0 SummarizedExperiment_1.39.0
##  [3] Biobase_2.69.0              GenomicRanges_1.61.0       
##  [5] GenomeInfoDb_1.45.4         IRanges_2.43.0             
##  [7] S4Vectors_0.47.0            BiocGenerics_0.55.0        
##  [9] generics_0.1.4              MatrixGenerics_1.21.0      
## [11] matrixStats_1.5.0           scQTLtools_1.1.5           
## [13] BiocStyle_2.37.0           
## 
## loaded via a namespace (and not attached):
##   [1] DBI_1.2.3            httr2_1.1.2          biomaRt_2.65.0      
##   [4] rlang_1.1.6          magrittr_2.0.3       gamlss_5.4-22       
##   [7] compiler_4.5.0       RSQLite_2.3.11       png_0.1-8           
##  [10] vctrs_0.6.5          stringr_1.5.1        pkgconfig_2.0.3     
##  [13] crayon_1.5.3         fastmap_1.2.0        magick_2.8.6        
##  [16] dbplyr_2.5.0         XVector_0.49.0       labeling_0.4.3      
##  [19] rmarkdown_2.29       UCSC.utils_1.5.0     tinytex_0.57        
##  [22] purrr_1.0.4          bit_4.6.0            xfun_0.52           
##  [25] cachem_1.1.0         jsonlite_2.0.0       progress_1.2.3      
##  [28] blob_1.2.4           DelayedArray_0.35.1  BiocParallel_1.43.3 
##  [31] VGAM_1.1-13          parallel_4.5.0       prettyunits_1.2.0   
##  [34] R6_2.6.1             bslib_0.9.0          stringi_1.8.7       
##  [37] RColorBrewer_1.1-3   limma_3.65.1         parallelly_1.44.0   
##  [40] jquerylib_0.1.4      GOSemSim_2.35.0      Rcpp_1.0.14         
##  [43] bookdown_0.43        knitr_1.50           future.apply_1.11.3 
##  [46] R.utils_2.13.0       Matrix_1.7-3         splines_4.5.0       
##  [49] tidyselect_1.2.1     dichromat_2.0-0.1    abind_1.4-8         
##  [52] yaml_2.3.10          gamlss.dist_6.1-1    codetools_0.2-20    
##  [55] curl_6.2.3           listenv_0.9.1        lattice_0.22-7      
##  [58] tibble_3.2.1         withr_3.0.2          KEGGREST_1.49.0     
##  [61] evaluate_1.0.3       survival_3.8-3       future_1.49.0       
##  [64] BiocFileCache_2.99.5 xml2_1.3.8           Biostrings_2.77.1   
##  [67] filelock_1.0.3       pillar_1.10.2        BiocManager_1.30.25 
##  [70] sp_2.2-0             hms_1.1.3            ggplot2_3.5.2       
##  [73] scales_1.4.0         globals_0.18.0       glue_1.8.0          
##  [76] tools_4.5.0          locfit_1.5-9.12      fs_1.6.6            
##  [79] dotCall64_1.2        grid_4.5.0           AnnotationDbi_1.71.0
##  [82] patchwork_1.3.0      nlme_3.1-168         cli_3.6.5           
##  [85] rappdirs_0.3.3       spam_2.11-1          S4Arrays_1.9.1      
##  [88] dplyr_1.1.4          gtable_0.3.6         R.methodsS3_1.8.2   
##  [91] yulab.utils_0.2.0    DESeq2_1.49.1        sass_0.4.10         
##  [94] digest_0.6.37        gamlss.data_6.0-6    progressr_0.15.1    
##  [97] SparseArray_1.9.0    SeuratObject_5.1.0   farver_2.1.2        
## [100] memoise_2.0.1        htmltools_0.5.8.1    R.oo_1.27.1         
## [103] lifecycle_1.0.4      httr_1.4.7           statmod_1.5.0       
## [106] GO.db_3.21.0         MASS_7.3-65          bit64_4.6.0-1