Contents

1 Introduction

RIPAT is an R package for Retroviral vector integration site analysis. This package is developed to find integration regions on the target sequence and study biological meaning of the integration pattern such as distance from important genomic factors and genomic features.
It’s distributed from our Github repository (https://github.com/bioinfo16/RIPAT) and Bioconductor.

2 Dependencies

RIPAT will be run on all R versions but, we strongly recommend that user install the R version 3.5.3 or higher. Plus, RIPAT needs these packages below to run. By the install command (writted below), RIPAT install all packages automatically. If user already install some of packages listed up, RIPAT will be reinstall proper version of these packages itself.

Package Version
GenomicRanges >= 1.34.4
ggplot2 >= 3.1.0
grDevices >= 3.5.3
karyoploteR >= 1.6.3
openxlsx >= 4.1.4
plyr >= 1.8.4
RColorBrewer >= 1.1-2
regioneR >= 1.12.0
stats >= 3.5.3
stringr >=1.3.1
utils >= 3.5.3
biomaRt >= 2.38.0
rtracklayer >= 1.42.2

3 Installation

We strongly recommand the way using devtools package in R.
But, you can download this package as zip-file format from our github repository.
Plus, we will upload this package to Bioconductor repository. It’s in progress.

devtools::install_github("bioinfo16/RIPAT")
R CMD INSTALL [PACKAGE_DOWNLOAD_PATH]

4 Databases

RIPAT provides four types of site annotation analysis, Gene, CpG site, Repeat and pathogenic variant. This R package uses Ensembl database, UCSC genome database and NCBI ClinVar in site annotation. Ensembl database is used to gene annotation. UCSC genome database is needed to annotation by gene, CpG site and repeat sequence. Pathogenic variant annotation uses from NCBI Clinvar database. This package can annotate integration sites on the human genome only. Avaliable target version is GRCh37 and GRCh38.

5 Input format

RIPAT can do integration pattern analysis with local alignment tools such as BLAST and BLAT. Each of cases makes the tab-delimited result file. Specific format for RIPAT is depicted to the table below.

6 Functions

In this manual, Gene annotation is written as a example.
Other annotation functions are working as a example case.

  1. With random integration site analysis

    blast_gene10K=annoByGene(hits=blast_obj,
                             organism='GRCh37',  
                             interval=5000,
                             range=c(-20000, 20000),  
                             doRandom=TRUE,
                             randomSize=10000,  
                             includeUndecided=FALSE,  
                             outPath='.',
                             outFileName='A5_15856M_BLAST_10K')

[ Integration site distribution from Genes ]

[ Integration site distribution from transcription start site (TSS) ]

  1. Without random integration site analysis

    blast_gene_norandom=annoByGene(hits=blast_obj, 
                                   organism='GRCh37',  
                                   interval=5000, 
                                   range=c(-20000, 20000),  
                                   doRandom=FALSE,  
                                   includeUndecided=FALSE,  
                                   outPath='.',
                                   outFileName='A5_15856M_BLAST_norandom')  

[ Integration site distribution from Genes ]

[ Integration site distribution from transcription start site (TSS) ]

[ Integration sites marked with genes ]

[ Histogram of experimental data ]

[ Histogram of random data ]

In p-value plot, full-colored circle means significant difference between exprimental data and random data (p-value <= 0.05). Circle size is p-value transformation score for visualization.

[ P-value plot ]

7 Session information

Example codes in this vignette was run under the following conditions:

R version 3.5.3 (2019-03-11)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=Korean_Korea.949  LC_CTYPE=Korean_Korea.949    
[3] LC_MONETARY=Korean_Korea.949 LC_NUMERIC=C                 
[5] LC_TIME=Korean_Korea.949    

attached base packages:
[1] parallel  stats4    stats     
[4] graphics  grDevices utils     
[7] datasets  methods  base     

other attached packages:
 [1] GenomicRanges_1.34.0        GenomeInfoDb_1.18.2           IRanges_2.16.0
 [4] RIPAT_0.99.0    S4Vectors_0.20.1      BiocGenerics_0.28.0        

loaded via a namespace (and not attached):
  [1] colorspace_1.4-1            ellipsis_0.3.0              
  [3] rprojroot_1.3-2             biovizBase_1.30.1
  [5] htmlTable_1.13.3            XVector_0.22.0             
  [7] base64enc_0.1-3             fs_1.3.1                    
  [9] dichromat_2.0-0             rstudioapi_0.11             
  [11] remotes_2.1.1              bit64_0.9-7                
  [13] AnnotationDbi_1.44.0       fansi_0.4.1                 
  [15] splines_3.5.3               knitr_1.28                  
  [17] pkgload_1.0.2               Formula_1.2-3              
  [19] Rsamtools_1.34.1            cluster_2.0.7-1
  [21] BiocManager_1.30.10         compiler_3.5.3  
 (...skipped...)

8 Citation

If you want to use RIPAT in your work, write it in your publications:

## To cite package 'RIPAT' in publications use:
## 
##   Min-Jeong Baek, Kwang-il Lim and In-Geol Choi. RIPAT (Retroviral
##   Integration Pattern Analysis Tool): an R package for analyzing
##   retroviral integration sites on human genome. Seoul, Korea (2020).
## 
## A BibTeX entry for LaTeX users is
## 
##   @Manual{,
##     title = {RIPAT : Retroviral Integration Pattern Analysis Tool},
##     author = {Min-Jeong Baek and Kwang-il Lim and In-Geol Choi},
##     year = {2020},
##   }

9 References

Appendix