Issues & Solutions

Original version: 16 October, 2023

library(AlphaMissenseR)

Updating duckdb to 0.9.1

The R duckdb client version 0.9.1 cannot read databases created with previous versions of the package. The duckdb error message indicates

am_available() Error in h(simpleError(msg, call)) : error in evaluating the argument ‘drv’ in selecting a method for function ‘dbConnect’: rapi_startup: Failed to open database: IO Error: Trying to read a database file with version number 51, but we can only read version 64.

The database file was created with DuckDB version v0.8.0 or v0.8.1.

The storage of DuckDB is not yet stable; newer versions of DuckDB cannot read old database files and vice versa. The storage will be stabilized when version 1.0 releases.

For now, we recommend that you load the database file in a supported version of DuckDB, and use the EXPORT DATABASE command followed by IMPORT DATABASE on the current version of DuckDB.

See the storage page for more information: https://duckdb.org/internals/storage

but in practice the most straight-forward solution is to remove existing AlphaMissenseR data resources and ‘start again’.

The following attempts to identify AlphaMissenseR data resources cached locally

am_rids <-
    bfcinfo() |>
    dplyr::filter(
        grepl("zenodo", rname) |
        startsWith(rname, "AlphaMissense_")
    ) |>
    pull(rid)

After verifying that these resources have not been created outside AlphaMissenseR, remove them.

BiocFileCache::bfcremove(rids = am_rids)

Commands such as am_available() should report no files cached. The command

am_data("gene_hg38")
#> # Source:   table<gene_hg38> [?? x 2]
#> # Database: DuckDB v0.10.2 [biocbuild@Linux 5.15.0-105-generic:R 4.4.0//home/biocbuild/.cache/R/BiocFileCache/e375f40dae9fd_e375f40dae9fd]
#>    transcript_id      mean_am_pathogenicity
#>    <chr>                              <dbl>
#>  1 ENST00000000233.10                 0.742
#>  2 ENST00000000412.8                  0.378
#>  3 ENST00000001008.6                  0.422
#>  4 ENST00000001146.6                  0.467
#>  5 ENST00000002125.9                  0.351
#>  6 ENST00000002165.11                 0.406
#>  7 ENST00000002501.10                 0.320
#>  8 ENST00000002596.6                  0.471
#>  9 ENST00000002829.8                  0.524
#> 10 ENST00000003084.10                 0.405
#> # ℹ more rows

will re-download the file and insert it into a database that functions with duckdb 0.9.1.

Resource temporarily unavailable

Trying to access a data resource with am_data() may sometimes result in a DuckDB errors about “Resource unavailable”.

> am_data("hg38")

* [10:05:09][warning] error in evaluating the argument 'drv' in selecting a
    method for function 'dbConnect': rapi_startup: Failed to open database:
    IO Error: Could not set lock on file 
    ".../Caches/org.R-project.R/R/BiocFileCache/1ec5157ddaa2_1ec5157ddaa2":
    Resource unavailable

Error in value[[3L]](cond) :
    failed to connect to DuckDB database, see 'Issues & Solutions' vignette

This occures when the database is being used by an independent R process. The solution is to identify the process and disconnect from the database using, e.g., db_disconnect_all().

Finally

Remember to disconnect and shutdown all managed DuckDB connections.

db_disconnect_all()

Database connections that are not closed correctly trigger warning messages.

Session information

sessionInfo()
#> R version 4.4.0 RC (2024-04-16 r86468)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 22.04.4 LTS
#> 
#> Matrix products: default
#> BLAS:   /home/biocbuild/bbs-3.20-bioc/R/lib/libRblas.so 
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_GB              LC_COLLATE=C              
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: America/New_York
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats4    stats     graphics  grDevices utils     datasets  methods  
#> [8] base     
#> 
#> other attached packages:
#>  [1] ensembldb_2.29.0        AnnotationFilter_1.29.0 GenomicFeatures_1.57.0 
#>  [4] AnnotationDbi_1.67.0    Biobase_2.65.0          GenomicRanges_1.57.0   
#>  [7] GenomeInfoDb_1.41.0     IRanges_2.39.0          S4Vectors_0.43.0       
#> [10] BiocGenerics_0.51.0     AlphaMissenseR_1.1.1    dplyr_1.1.4            
#> 
#> loaded via a namespace (and not attached):
#>  [1] DBI_1.2.2                   bitops_1.0-7               
#>  [3] rlang_1.1.3                 magrittr_2.0.3             
#>  [5] matrixStats_1.3.0           compiler_4.4.0             
#>  [7] RSQLite_2.3.6               png_0.1-8                  
#>  [9] vctrs_0.6.5                 ProtGenerics_1.37.0        
#> [11] pkgconfig_2.0.3             crayon_1.5.2               
#> [13] fastmap_1.1.1               dbplyr_2.5.0               
#> [15] XVector_0.45.0              utf8_1.2.4                 
#> [17] Rsamtools_2.21.0            promises_1.3.0             
#> [19] rmarkdown_2.26              UCSC.utils_1.1.0           
#> [21] purrr_1.0.2                 bit_4.0.5                  
#> [23] xfun_0.43                   zlibbioc_1.51.0            
#> [25] cachem_1.0.8                jsonlite_1.8.8             
#> [27] blob_1.2.4                  later_1.3.2                
#> [29] DelayedArray_0.31.0         BiocParallel_1.39.0        
#> [31] parallel_4.4.0              spdl_0.0.5                 
#> [33] R6_2.5.1                    bslib_0.7.0                
#> [35] rtracklayer_1.65.0          jquerylib_0.1.4            
#> [37] Rcpp_1.0.12                 SummarizedExperiment_1.35.0
#> [39] knitr_1.46                  BiocBaseUtils_1.7.0        
#> [41] httpuv_1.6.15               Matrix_1.7-0               
#> [43] tidyselect_1.2.1            abind_1.4-5                
#> [45] yaml_2.3.8                  codetools_0.2-20           
#> [47] curl_5.2.1                  rjsoncons_1.3.0            
#> [49] lattice_0.22-6              tibble_3.2.1               
#> [51] shiny_1.8.1.1               bio3d_2.4-4                
#> [53] withr_3.0.0                 KEGGREST_1.45.0            
#> [55] evaluate_0.23               r3dmol_0.1.2               
#> [57] BiocFileCache_2.13.0        Biostrings_2.73.0          
#> [59] pillar_1.9.0                BiocManager_1.30.23        
#> [61] filelock_1.0.3              MatrixGenerics_1.17.0      
#> [63] whisker_0.4.1               generics_0.1.3             
#> [65] RCurl_1.98-1.14             BiocVersion_3.20.0         
#> [67] xtable_1.8-4                glue_1.7.0                 
#> [69] lazyeval_0.2.2              tools_4.4.0                
#> [71] AnnotationHub_3.13.0        BiocIO_1.15.0              
#> [73] GenomicAlignments_1.41.0    XML_3.99-0.16.1            
#> [75] grid_4.4.0                  tidyr_1.3.1                
#> [77] colorspace_2.1-0            GenomeInfoDbData_1.2.12    
#> [79] RcppSpdlog_0.0.17           duckdb_0.10.2              
#> [81] restfulr_0.0.15             cli_3.6.2                  
#> [83] rappdirs_0.3.3              fansi_1.0.6                
#> [85] S4Arrays_1.5.0              sass_0.4.9                 
#> [87] digest_0.6.35               SparseArray_1.5.1          
#> [89] rjson_0.2.21                htmlwidgets_1.6.4          
#> [91] memoise_2.0.1               htmltools_0.5.8.1          
#> [93] lifecycle_1.0.4             httr_1.4.7                 
#> [95] mime_0.12                   bit64_4.0.5