1 Quick start for Rcollectl

library("Rcollectl")

Collectl is a unix-based tool that will perform measurements on system resource consumption of various types. We provide a demonstration output with the package:

lk = cl_parse(system.file("demotab/demo_1123.tab.gz", package="Rcollectl"))
dim(lk)
#> [1] 478  71
attr(lk, "meta")
#>  [1] "################################################################################"
#>  [2] "# Collectl:   V4.3.1-1  HiRes: 1  Options: -scdnm -P -f./col2.txt "              
#>  [3] "# Host:       stvjc-XPS-13-9300  DaemonOpts: "                                   
#>  [4] "# Booted:     1606052236.57 [20201122-08:37:16]"                                 
#>  [5] "# Distro:     debian bullseye/sid, Ubuntu 20.04.1 LTS  Platform: "               
#>  [6] "# Date:       20201123-144054  Secs: 1606160454 TZ: -0500"                       
#>  [7] "# SubSys:     cdnm Options:  Interval: 1 NumCPUs: 8 [HYPER] NumBud: 0 Flags: i"  
#>  [8] "# Filters:    NfsFilt:  EnvFilt:  TcpFilt: ituc"                                 
#>  [9] "# HZ:         100  Arch: x86_64-linux-gnu-thread-multi PageSize: 4096"           
#> [10] "# Cpu:        GenuineIntel Speed(MHz): 1745.513 Cores: 4  Siblings: 8 Nodes: 1"  
#> [11] "# Kernel:     5.4.0-54-generic  Memory: 15969160 kB  Swap: 2097148 kB"           
#> [12] "# NumDisks:   1 DiskNames: nvme0n1"                                              
#> [13] "# NumNets:    4 NetNames: lo:?? enxc03ebaccccfd:100 docker0:?? wlp0s20f3:??"     
#> [14] "################################################################################"
lk[1:5,1:5]
#>      #Date     Time CPU_User% CPU_Nice% CPU_Sys%
#> 1 20201123 14:40:56         2         0        1
#> 2 20201123 14:40:57         1         0        0
#> 3 20201123 14:40:58         1         0        0
#> 4 20201123 14:40:59         2         0        0
#> 5 20201123 14:41:00         3         0        1
plot_usage(lk)

From this display, we can see that about a burst of network activity around 14:43 is followed by consumption of CPU, memory, and disk resources. The % CPU active never exceeds 30, memory consumption started relatively high when sampling began, growing to about 15.5 GB. and 250MB were written to disk over the entire interval.

To generate a display like this, we use commands shown below. You can use an arbitrary string as [target file prefix]. Thus cl_start("foo") will produce a file foo-[hostname]-[yyyymmdd].tab.gz, containing timing and consumption data, where [hostname] is the value of hostname and [yyyymmdd] is a representation of the current date. Use different target file prefixes for runs you wish to distinguish.

id = cl_start([target file prefix])
[use R until task to be measured is complete]
cl_stop(id)
usage_df = cl_parse(dir(patt=[target file prefix]))
# analyze or filter the usage_df (for example, to trim away
# time related to task delay or delay of `cl_stop`
plot_usage(usage_df)

2 Timestamps

Yubo Cheng has added functionality allowing us to annotate usage plots with labels related to task phases. Here is the code from the example showing how to introduce annotations in the time profile.

     id <- cl_start() 
     Sys.sleep(2)
     #code
     cl_timestamp(id, "step1")
     Sys.sleep(2)
     # code
     Sys.sleep(2)
     cl_timestamp(id, "step2")
     Sys.sleep(2)
     # code
     Sys.sleep(2)
     cl_timestamp(id, "step3")
     Sys.sleep(2)
     # code
     cl_stop(id)
     path <- cl_result_path(id)
     
     plot_usage(cl_parse(path)) +
       cl_timestamp_layer(path) +
       cl_timestamp_label(path) +
       ggplot2::theme(axis.text.x = ggplot2::element_text(angle = 90, vjust = 0.5, hjust=1))

3 Reproducibility

The Rcollectl package (Carey and Cheng, 2024) was made possible thanks to:

This package was developed using biocthis.

Code for creating the vignette

## Create the vignette
library("rmarkdown")
system.time(render("Rcollectl.Rmd", "BiocStyle::html_document"))

## Extract the R code
library("knitr")
knit("Rcollectl.Rmd", tangle = TRUE)
## Clean up
file.remove("Rcollectl.bib")
#> [1] TRUE

Date the vignette was generated.

#> [1] "2024-05-01 01:32:03 EDT"

Wallclock time spent generating the vignette.

#> Time difference of 14.814 secs

R session information.

#> ─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.4.0 beta (2024-04-15 r86425)
#>  os       Ubuntu 22.04.4 LTS
#>  system   x86_64, linux-gnu
#>  ui       X11
#>  language (EN)
#>  collate  en_US.UTF-8
#>  ctype    en_US.UTF-8
#>  tz       America/New_York
#>  date     2024-05-01
#>  pandoc   2.7.3 @ /usr/bin/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
#>  package       * version date (UTC) lib source
#>  backports       1.4.1   2021-12-13 [2] CRAN (R 4.4.0)
#>  bibtex          0.5.1   2023-01-26 [2] CRAN (R 4.4.0)
#>  BiocManager     1.30.22 2023-08-08 [2] CRAN (R 4.4.0)
#>  BiocStyle     * 2.32.0  2024-04-30 [2] Bioconductor 3.19 (R 4.4.0)
#>  bookdown        0.39    2024-04-15 [2] CRAN (R 4.4.0)
#>  bslib           0.7.0   2024-03-29 [2] CRAN (R 4.4.0)
#>  cachem          1.0.8   2023-05-01 [2] CRAN (R 4.4.0)
#>  cli             3.6.2   2023-12-11 [2] CRAN (R 4.4.0)
#>  colorspace      2.1-0   2023-01-23 [2] CRAN (R 4.4.0)
#>  digest          0.6.35  2024-03-11 [2] CRAN (R 4.4.0)
#>  dplyr           1.1.4   2023-11-17 [2] CRAN (R 4.4.0)
#>  evaluate        0.23    2023-11-01 [2] CRAN (R 4.4.0)
#>  fansi           1.0.6   2023-12-08 [2] CRAN (R 4.4.0)
#>  farver          2.1.1   2022-07-06 [2] CRAN (R 4.4.0)
#>  fastmap         1.1.1   2023-02-24 [2] CRAN (R 4.4.0)
#>  generics        0.1.3   2022-07-05 [2] CRAN (R 4.4.0)
#>  ggplot2         3.5.1   2024-04-23 [2] CRAN (R 4.4.0)
#>  glue            1.7.0   2024-01-09 [2] CRAN (R 4.4.0)
#>  gtable          0.3.5   2024-04-22 [2] CRAN (R 4.4.0)
#>  highr           0.10    2022-12-22 [2] CRAN (R 4.4.0)
#>  htmltools       0.5.8.1 2024-04-04 [2] CRAN (R 4.4.0)
#>  httr            1.4.7   2023-08-15 [2] CRAN (R 4.4.0)
#>  jquerylib       0.1.4   2021-04-26 [2] CRAN (R 4.4.0)
#>  jsonlite        1.8.8   2023-12-04 [2] CRAN (R 4.4.0)
#>  knitcitations * 1.0.12  2021-01-10 [2] CRAN (R 4.4.0)
#>  knitr           1.46    2024-04-06 [2] CRAN (R 4.4.0)
#>  labeling        0.4.3   2023-08-29 [2] CRAN (R 4.4.0)
#>  lifecycle       1.0.4   2023-11-07 [2] CRAN (R 4.4.0)
#>  lubridate       1.9.3   2023-09-27 [2] CRAN (R 4.4.0)
#>  magrittr        2.0.3   2022-03-30 [2] CRAN (R 4.4.0)
#>  munsell         0.5.1   2024-04-01 [2] CRAN (R 4.4.0)
#>  pillar          1.9.0   2023-03-22 [2] CRAN (R 4.4.0)
#>  pkgconfig       2.0.3   2019-09-22 [2] CRAN (R 4.4.0)
#>  plyr            1.8.9   2023-10-02 [2] CRAN (R 4.4.0)
#>  processx        3.8.4   2024-03-16 [2] CRAN (R 4.4.0)
#>  ps              1.7.6   2024-01-18 [2] CRAN (R 4.4.0)
#>  R6              2.5.1   2021-08-19 [2] CRAN (R 4.4.0)
#>  Rcollectl     * 1.4.0   2024-04-30 [1] Bioconductor 3.19 (R 4.4.0)
#>  Rcpp            1.0.12  2024-01-09 [2] CRAN (R 4.4.0)
#>  RefManageR      1.4.0   2022-09-30 [2] CRAN (R 4.4.0)
#>  rlang           1.1.3   2024-01-10 [2] CRAN (R 4.4.0)
#>  rmarkdown       2.26    2024-03-05 [2] CRAN (R 4.4.0)
#>  sass            0.4.9   2024-03-15 [2] CRAN (R 4.4.0)
#>  scales          1.3.0   2023-11-28 [2] CRAN (R 4.4.0)
#>  sessioninfo   * 1.2.2   2021-12-06 [2] CRAN (R 4.4.0)
#>  stringi         1.8.3   2023-12-11 [2] CRAN (R 4.4.0)
#>  stringr         1.5.1   2023-11-14 [2] CRAN (R 4.4.0)
#>  tibble          3.2.1   2023-03-20 [2] CRAN (R 4.4.0)
#>  tidyselect      1.2.1   2024-03-11 [2] CRAN (R 4.4.0)
#>  timechange      0.3.0   2024-01-18 [2] CRAN (R 4.4.0)
#>  utf8            1.2.4   2023-10-22 [2] CRAN (R 4.4.0)
#>  vctrs           0.6.5   2023-12-01 [2] CRAN (R 4.4.0)
#>  withr           3.0.0   2024-01-16 [2] CRAN (R 4.4.0)
#>  xfun            0.43    2024-03-25 [2] CRAN (R 4.4.0)
#>  xml2            1.3.6   2023-12-04 [2] CRAN (R 4.4.0)
#>  yaml            2.3.8   2023-12-11 [2] CRAN (R 4.4.0)
#> 
#>  [1] /tmp/RtmpGaZTmv/Rinst3f663e275242b2
#>  [2] /home/biocbuild/bbs-3.19-bioc/R/site-library
#>  [3] /home/biocbuild/bbs-3.19-bioc/R/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

4 Bibliography

This vignette was generated using BiocStyle (Oleś, 2024) with knitr (Xie, 2024) and rmarkdown (Allaire, Xie, Dervieux, McPherson et al., 2024) running behind the scenes.

Citations made with knitcitations (Boettiger, 2021).

[1] J. Allaire, Y. Xie, C. Dervieux, J. McPherson, et al. rmarkdown: Dynamic Documents for R. R package version 2.26. 2024. https://github.com/rstudio/rmarkdown.

[2] C. Boettiger. knitcitations: Citations for ‘Knitr’ Markdown Files. R package version 1.0.12. 2021. https://CRAN.R-project.org/package=knitcitations.

[3] V. Carey and Y. Cheng. Rcollectl: Help use collectl with R in Linux, to measure resource consumption in R processes. R package version 1.4.0. 2024. https://github.com/vjcitn/Rcollectl.

[4] A. Oleś. BiocStyle: Standard styles for vignettes and other Bioconductor documents. R package version 2.32.0. 2024. https://github.com/Bioconductor/BiocStyle.

[5] R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. Vienna, Austria, 2024. https://www.R-project.org/.

[6] H. Wickham. “testthat: Get Started with Testing”. In: The R Journal 3 (2011), pp. 5-10. https://journal.r-project.org/archive/2011-1/RJournal_2011-1_Wickham.pdf.

[7] H. Wickham, W. Chang, R. Flight, K. Müller, et al. sessioninfo: R Session Information. R package version 1.2.2. 2021. https://CRAN.R-project.org/package=sessioninfo.

[8] Y. Xie. knitr: A General-Purpose Package for Dynamic Report Generation in R. R package version 1.46. 2024. https://yihui.org/knitr/.