In general, I recommend against interpreting the fraction of variance explained by residuals. This fraction is driven by:
If you have additional variables that explain variation in measured gene expression, you should include them in order to avoid confounding with your variable of interest. But a particular residual fraction is not ‘good’ or ‘bad’ and is not a good metric of determining whether more variables should be included.
See GitHub page for up-to-date responses to users’ questions.
## R version 4.4.0 RC (2024-04-16 r86468)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 22.04.4 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.20-bioc/R/lib/libRblas.so
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_GB
## [4] LC_COLLATE=C LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C
## [10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/New_York
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## loaded via a namespace (and not attached):
## [1] digest_0.6.35 R6_2.5.1 fastmap_1.2.0 xfun_0.44 cachem_1.1.0
## [6] knitr_1.47 htmltools_0.5.8.1 rmarkdown_2.27 lifecycle_1.0.4 cli_3.6.2
## [11] sass_0.4.9 jquerylib_0.1.4 compiler_4.4.0 tools_4.4.0 evaluate_0.24.0
## [16] bslib_0.7.0 yaml_2.3.8 rlang_1.1.4 jsonlite_1.8.8