How to mirror Bioconductor repository

Bioconductor package repositories may be mirrored with rsync This is appropriate if you or your user community requires frequent local access to many packages, where access to the main repository would be expensive or impossible (e.g., because users can only access repositories behind a firewall).

SSH key and IP address for rsync

Using rsync requires that you provide Bioconductor with minimal information that includes (a) with an ssh public key and (b) the IP address(es) from which you will perform rsync. If you would like to request rsync access please fill out this google form: private mirror/rsync request form

Public mirrors

A mirror is considered “public” if it is an option in R’s chooseBioCmirror() function and listed on our mirrors page. Public mirrors must support https on their site. If you are interested in hosting a publicly available mirror site, please fill out this google form: public Bioc mirror request form

Structure of the rsync command

The overall structure of the rsync command is

rsync SSH_OPTION -e "ssh -i /path/to/ssh" [OPTIONS] SRC DEST

SSH_OPTION is required, and tells rsync to use SSH during the transfer. An appropriate command might be

-e "ssh -i ~/.ssh/"

This can often be abbreviated to -e "ssh", and customized through an SSH config file.

[OPTIONS] determine how files are synchronized with the server. For a mirror, appropriate values are -zrtlv --delete.

SRC consists of the account and host for the connection, and the path on the host to the hierarchy to be synchronized. For instance, to synchronize the software packages on the current release branch one would use

bioc-rsync@master.bioconductor.org:3.20/bioc

DEST is the location on the local file system of the synchronized repository, e.g.,

~/bioconductor_repositories/3.20/bioc

Thus a complete command might be

rsync -e "ssh -i ~/.ssh" -zrtlv --delete \
    bioc-rsync@master.bioconductor.org:3.20/bioc \
    ~/bioconductor_repositories/3.20/bioc

Example: rsync an entire Bioconductor release

This is appropriate if you are providing your user community with a version of the entire Bioconductor release, with software, data annotation, and data experiment packages. The following uses Bioconductor version 3.20, the current release.

Directory structure

Pick a destination directory where files will be mirrored. Let’s say this will be in /dest. This directory should be served by your web server. Under that you’ll need a directory called packages. This directory must be present as it is part of the structure of a Bioconductor repository. Underneath packages should be a directory corresponding to the versions of Bioconductor that you will host. The current release version is 3.20 and the current devel version is 3.21. We recommend you use symlinks called release and devel that always point to the current release and devel versions; this way you will never have to change your rsync commands. But you should change the symlink targets with every Bioconductor release (see the release schedule for exact dates).

The following commands will create the directory structure you’ll need (remember that /dest is just an example of the destination directory you could use; you can put this directory anywhere on your system where there is enough free space).

mkdir -p /dest/packages
mkdir /dest/packages/3.20 # current release
mkdir /dest/packages/3.21 # current devel
ln -s /dest/packages/3.20 /dest/packages/release # change these links
ln -s /dest/packages/3.20 /dest/packages/devel   # every 6 months (with Bioc release)

rsync the Bioconductor release repository

To synchronize an entire release, use the command

rsync -e "ssh" -zrtlv --delete bioc-rsync@master.bioconductor.org:release /dest/packages/release

It is also possible to separately synchronize just the software packages…

rsync -e "ssh" -zrtlv --delete bioc-rsync@master.bioconductor.org:release/bioc /dest/packages/release/bioc

…or the data annotation and data experiment packages

rsync -e "ssh" -zrtlv --delete bioc-rsync@master.bioconductor.org:release/data /dest/packages/release/data

rsync the Bioconductor devel repository

To mirror all Bioconductor ‘devel’ (version 3.21) repositories:

rsync -e "ssh" -zrtlv --delete bioc-rsync@master.bioconductor.org:devel /dest/packages/devel

Bioconductor devel software repository:

rsync -e "ssh" -zrtlv --delete bioc-rsync@master.bioconductor.org:devel/bioc /dest/packages/devel/bioc

Bioconductor devel annotation and experiment data repositories:

rsync -e "ssh" -zrtlv --delete bioc-rsync@master.bioconductor.org:devel/data /dest/packages/devel/data

Additional information

Make sure the directory above packages is served by a web server.

Bioconductor is big (> 188GB for BioC 3.21). Please check the size of what will be transferred with e.g. rsync -e "ssh" -avn bioc-rsync@master.bioconductor.org:release and make sure you have enough room on your local disk before you start.

It is recommended that package repositories be synchronized once per day, scheduled with cron.

Begin using your new local repository by making it accessible on your webserver. See the “contriburl” option to install.packages() (utils) for more information.

The Bioconductor package repositories for the current release and devel reside at https://master.bioconductor.org in the Amazon cloud.