CRISPR package demo Bioc2014 Boston July 31st 2014 ======================================================== First load the required packages and specify the input file path. We are going to use a sequence from human as input, which has been included as as fasta file in the CRISPRseek package. To perform off target analysis, we need to load Human BSgenome package To annotate the target and off-targets, we need to load Human Transcript package Additionaly, need to specify the file containing all restriction enzyme (RE) cut patterns. You have the option to use the RE pattern file in the CRISPR package, or specify your own RE pattern file. Furthermore, you need to specify the output directory which will be the directory to look for all the output files. ```{r} library(CRISPRseek) library(BSgenome.Hsapiens.UCSC.hg19) library(TxDb.Hsapiens.UCSC.hg19.knownGene) outputDir <- file.path(getwd(),"CRISPRseekDemo") inputFilePath <- system.file('extdata', 'inputseq.fa', package = 'CRISPRseek') REpatternFile <- system.file('extdata', 'NEBenzymes.fa', package = 'CRISPRseek') ``` Here is the command to learn more about offTargetAnalysis function and different use cases. ======================================================== ```{r} ?offTargetAnalysis ?compare2Sequences ?CRISPRseek browseVignettes('CRISPRseek') ``` Scenario 1: Finding paired gRNAs with off-target analysis ======================================================== ```{r} offTargetAnalysis(inputFilePath, findgRNAsWithREcutOnly = FALSE, REpatternFile = REpatternFile, findPairedgRNAOnly = TRUE, BSgenomeName = Hsapiens, chromToSearch ="chrX", min.gap = 0, max.gap = 20, txdb = TxDb.Hsapiens.UCSC.hg19.knownGene, max.mismatch = 0, outputDir = outputDir, overwrite = TRUE) ``` Maximum mismatch can be altered. The larger it is, the slower it runs. ```{r} offTargetAnalysis(inputFilePath, findgRNAsWithREcutOnly = FALSE, REpatternFile = REpatternFile, findPairedgRNAOnly = TRUE, BSgenomeName = Hsapiens, chromToSearch ="chrX", min.gap = 0, max.gap = 20, txdb = TxDb.Hsapiens.UCSC.hg19.knownGene, max.mismatch = 2, outputDir = outputDir, overwrite = TRUE) ``` Scenario 2: Finding paired gRNAs with restriction enzyme cut site(s) and off-target analysis ======================================================== ```{r} offTargetAnalysis(inputFilePath, findgRNAsWithREcutOnly = TRUE, REpatternFile = REpatternFile, findPairedgRNAOnly = TRUE, BSgenomeName = Hsapiens, chromToSearch ="chrX", min.gap = 0, max.gap = 20, txdb = TxDb.Hsapiens.UCSC.hg19.knownGene, max.mismatch = 0, outputDir = outputDir, overwrite = TRUE) ``` Scenario 3: Finding all gRNAs with off-target analysis, which will be the slowest ======================================================== Please note that max.mismatch is set to 3 so that we can view the off-targets ```{r} offTargetAnalysis(inputFilePath, findgRNAsWithREcutOnly = FALSE, REpatternFile = REpatternFile, findPairedgRNAOnly = FALSE, BSgenomeName = Hsapiens, chromToSearch ="chrX", min.gap = 0, max.gap = 20, txdb = TxDb.Hsapiens.UCSC.hg19.knownGene, max.mismatch = 3, outputDir = outputDir, overwrite = TRUE) ``` Scenario 4: Finding gRNAs with restriction enzyme cut site(s) and off-target analysis ======================================================== ```{r} offTargetAnalysis(inputFilePath, findgRNAsWithREcutOnly = TRUE, REpatternFile = REpatternFile, findPairedgRNAOnly = FALSE, BSgenomeName = Hsapiens, chromToSearch ="chrX", min.gap = 0, max.gap = 20, txdb = TxDb.Hsapiens.UCSC.hg19.knownGene, max.mismatch = 0, outputDir = outputDir, overwrite = TRUE) ``` Scenario 5: Target and off-target analysis for user specified gRNAs ======================================================== Calling the function offTargetAnalysis with findgRNAs = FALSE results in target and off-target searching, scoring and annotating for the input gRNAs. The gRNAs will be annotated with restriction enzyme cut sites for users to review later. However, paired information will not be available. ```{r} gRNAFilePath <- system.file('extdata', 'testHsap_GATA1_ex2_gRNA1.fa', package = 'CRISPRseek') offTargetAnalysis(inputFilePath = gRNAFilePath, findgRNAsWithREcutOnly = TRUE, REpatternFile = REpatternFile, findPairedgRNAOnly = FALSE, findgRNAs = FALSE, BSgenomeName = Hsapiens, chromToSearch = 'chrX', txdb = TxDb.Hsapiens.UCSC.hg19.knownGene, max.mismatch = 2, outputDir = outputDir, overwrite = TRUE) ``` Scenario 6. Quick gRNA finding without off-target analysis ======================================================== Calling the function offTargetAnalysis with chromToSearch = ”” results in quick gRNA search without performing on-target and off-target analysis. Parameters findgRNAsWithREcutOnly and find- PairedgRNAOnly can be tuned to indicate whether searching for gRNAs overlap restriction enzyme cut sites or not, and whether searching for gRNAs in paired configuration or not. ```{r} offTargetAnalysis(inputFilePath, findgRNAsWithREcutOnly = TRUE, REpatternFile = REpatternFile,findPairedgRNAOnly = TRUE, chromToSearch = "", outputDir = outputDir, overwrite = TRUE) ``` Scenario 7. Find potential gRNAs preferentially targeting one of two alleles without running time-consuming off-target analysis on all possible gRNAs. ======================================================== Below is an example to search for all gRNAs that target at least one of the alleles. Two files are provided containing sequences that differ by a single nucleotide polymorphism (SNP). The results are saved in file scoresFor2InputSequences.xls in outputDir directory. ```{r} inputFile1Path <- system.file("extdata", "rs362331C.fa", package = "CRISPRseek") inputFile2Path <- system.file("extdata", "rs362331T.fa", package = "CRISPRseek") seqs <- compare2Sequences(inputFile1Path, inputFile2Path, outputDir = outputDir , REpatternFile = REpatternFile, overwrite = TRUE) ``` Excercise 1 ======================================================== To preferentially target one allele, select gRNA sequences that have the lowest score for the other allele. Selected gRNAs can then be examined for off-target sequences as described in Scenario 6. Excercise 2 ======================================================== Identify gRNAs that target the following two input sequences equally well with minimized off-target cleavage >MfSerpAEx2 GACGATGGCATCCTCCGTTCCCTGGGGCCTCCTGCTGCTGGCGGGGCTGTGCTGCCTGGCCCCCCGCTCCCTGGCCTCGAGTCCCCTGGGAGCCGCTGTCCAGGACACAGGTGCACCCCACCACGACCATGAGCACCATGAGGAGCCAGCCTGCCACAAGATTGCCCCGAACCTGGCCGACTTCGCCTTCAGCATGTACCGCCAGGTGGCGCATGGGTCCAACACCACCAACATCTTCTTCTCCCCCGTGAGCATCGCGACCGCCTTTGCGTTGCTTTCTCTGGGGGCCAAGGGTGACACTCACTCCGAGATCATGAAGGGCCTTAGGTTCAACCTCACTGAGAGAGCCGAGGGTGAGGTCCACCAAGGCTTCCAGCAACTTCTCCGCACCCTCAACCACCCAGACAACCAGCTGCAGCTGACCACTGGCAATGGTCTCTTCATCGCTGAGGGCATGAAGCTACTGGATAAGTTTTTGGAGGATGTCAAGAACCTGTACCACTCAGAAGCCTTCTCCACCAATTTCGGGGACACCGAAGCAGCCAAGAAACAGATCAACGATTATGTTGAGAAGGGAACCCAAGGGAAAATTGTGGATTTGGTCAAAGACCTTGACAAAGACACAGCTTTCGCTCTGGTGAATTACATTTTCTTTAAAG >HsSerpAEx2 GACAATGCCGTCTTCTGTCTCGTGGGGCATCCTCCTGCTGGCAGGCCTGTGCTGCCTGGTCCCTGTCTCCCTGGCTGAGGATCCCCAGGGAGATGCTGCCCAGAAGACAGATACATCCCACCATGATCAGGATCACCCAACCTTCAACAAGATCACCCCCAACCTGGCTGAGTTCGCCTTCAGCCTATACCGCCAGCTGGCACACCAGTCCAACAGCACCAATATCTTCTTCTCCCCAGTGAGCATCGCTACAGCCTTTGCAATGCTCTCCCTGGGGACCAAGGCTGACACTCACGATGAAATCCTGGAGGGCCTGAATTTCAACCTCACGGAGATTCCGGAGGCTCAGATCCATGAAGGCTTCCAGGAACTCCTCCGTACCCTCAACCAGCCAGACAGCCAGCTCCAGCTGACCACCGGCAATGGCCTGTTCCTCAGCGAGGGCCTGAAGCTAGTGGATAAGTTTTTGGAGGATGTTAAAAAGTTGTACCACTCAGAAGCCTTCACTGTCAACTTCGGGGACACCGAAGAGGCCAAGAAACAGATCAACGATTACGTGGAGAAGGGTACTCAAGGGAAAATTGTGGATTTGGTCAAGGAGCTTGACAGAGACACAGTTTTTGCTCTGGTGAATTACATCTTCTTTAAAG Excercise 3 ======================================================== Constraint gRNA Sequence by setting gRNA.pattern to require or exclude specific features within the target site. 3a. Synthesis of gRNAs in vivo from host U6 promoters is more efficient if the first base is guanine. To maximize the efficiency, what can we set gRNA.pattern? 3b. Synthesis of gRNAs in vitro using T7 promoters is most efficient when the first two bases are GG. To maximize the efficiency, what can we set gRNA.pattern? 3c. Five consecutive uracils in any position of a gRNA will affect transcription elongation by RNA polymerase III. To avoid premature termination during gRNA synthesis using U6 promoter, what can we set gRNA.pattern? 3d. Some studies have identified sequence features that broadly correlate with lower nuclease cleavage activity, such as uracil in the last 4 positions of the guide sequence. To avoid uracil in these positions, what can we specify gRNA.pattern? Excercise 4 ======================================================== In the examples we went through, we deliberately restricted searching off-targets in chromosome X. If we are interested in genome-wide search, what should we set chromToSearch to? Excercise 5 ======================================================== Find gRNAs in a paired configration with distance apart between 5 and 15 without performing off-target analysis Excercise 6 ======================================================== Create a transcriptDB object Excercise 7 ======================================================== It is known that different CRISPR-cas system uses different PAM sequence, what parameter needs to be reset? Excercise 8 ======================================================== It is known that different CRISPR-cas system has different gRNA length, what parameter needs to be reset? Excercise 9 ======================================================== Which parameter needs to be reset to 8 if we are interested in finding gRANs with restriction enzyme pattern of size 8 or above? Excercise 10 ======================================================== New penalty matrix has been recently derived, which parameter needs to be set accordingly? Excercise 11 ======================================================== It has been shown that although PAM sequence NGG is preferred, a variant NAG is also recognized with less effecieny. The researcher is interested in performing off-target searching to include both NGG and NAG variants, but requiring that gRNAs must precede NGG. What parameter(s) need to be set correctly to carry such a search? Excercise 12 ======================================================== Could you think of any other use cases?