epistasisGA 1.6.0
In addition to epistatic effects among child SNPs, the GADGETS method (Nodzenski* et al. 2022) can search for genetic interactions associated with child disease that involve multiple maternal SNPs, or those involving a combination of maternal and child SNPs (Nodzenski et al. 2023). We refer to the former as ‘epistatic maternally-mediated effects’ and the latter as ‘maternal-fetal interactions’. For full details of this application of GADGETS, analysts should consult Nodzenski et al. (2023). This vignette will demonstrate how to conduct such an analysis using the epistasisGA package. Users are encouraged to first consult the ‘GADGETS’ vignette for a detailed overview of the software options.
To search for maternal SNPs that are involved in genetic interactions associated with disease in the child, GADGETS requires case-parent triad data (i.e., it does not accommodate disease-discordant sibling studies). The method also relies on an assumption of ‘mating symmetry’ among the parents in the source population under study. Under mating symmetry, for any parental genotypes \(g_1\) and \(g_2\), mother/father pairs with genotypes \((g_1, g_2)\) should be as prevalent as those with genotypes \((g_2, g_1)\). GADGETS also assumes that no fetal SNP influences fetal survival. The validity of those assumptions should be assessed by the user.
We begin our example usage of GADGETS by loading a simulated example of case-parent triad data.
library(epistasisGA)
data(case.mci)
case <- as.matrix(case.mci)
data(dad.mci)
dad <- as.matrix(dad.mci)
data(mom.mci)
mom <- as.matrix(mom.mci)
data(snp.annotations.mci)
These data include a simulated maternal-fetal interaction involving one maternal-SNP and two child SNPs. The implicated maternal SNP genotypes are contained in column 6 of the input data, and risk-related child SNPs are in columns 12 and 18. In total, these data include 24 simulated loci, with columns 1-6, 7-12, 13-18, and 19-24 located on chromosomes 10, 11, 12, and 13, respectively. Note that the genotypes are coded as 0, 1, and 2, corresponding to counts of an analyst-designated variant allele. GADGETS does not currently accept genotype data imputed with uncertainty. In some genotypes are sporadically missing, those genotypes should be coded as NA. If one member of the triad, for instance the father, was not genotyped at all, that family should be excluded from analysis.
GADGETS was first developed to mine for higher-order genetic interactions among offspring SNPs, where, for case-parent triad data, the case genotypes are compared to those of the ‘pseudo-sibling’. We define the pseudo-sibling as the parental genotypes that the case could have inherited, but did not (pseudo-sibling = mom + dad - case). We construct the pseudo-sibling genotypes for this example as follows:
pseudo.sib <- mom + dad - case
To also include maternal SNPs in the search, GADGETS treats the maternal genotypes like those of cases, and treats the paternal SNPs like those of controls. To create the required data inputs, we therefore combine the case and maternal genotypes, and, separately, the pseudo-sibling and paternal genotypes. Note that we need to align the input data so that SNPs that appear on chromosome 10 appear in the first set of columns, chromosome 11 next, etc., and, within the chromosome specific columns, the all child SNPs appear first, followed by all parental SNPs. We start by combing case and mother data:
input.case <- cbind(case[ , 1:6], mom[ , 1:6],
case[ , 7:12], mom[ , 7:12],
case[ , 13:18], mom[ , 13:18],
case[ , 19:24], mom[ , 19:24])
Then we combine pseudo-sibling and father:
input.control <- cbind(pseudo.sib[ , 1:6], dad[ , 1:6],
pseudo.sib[ , 7:12], dad[ , 7:12],
pseudo.sib[ , 13:18], dad[ , 13:18],
pseudo.sib[ , 19:24], dad[ , 19:24])
The second step in the analysis pipeline is to pre-process the data. This step requires the user to indicate which sets of SNPs in the input should be considered non-independent under the null. Below, we assume that all SNPs located on the same nominal chromosome, child or parent, should be considered non-independent (or ‘linked’). We indicate that as follows:
ld.block.vec <- rep(12, 4)
This vector indicates that the input genetic data has 4 distinct linkage blocks, with SNPs 1-12 in the first block, 13-24 in the second block, 25-36 in the third block, and 37-48 in the fourth block. Also note that, for instance, SNPs 1-12 comprise the 6 child SNPs on chromosome 10, followed by the 6 maternal SNPs on chromosome 10.
Now, we can execute pre-processing. Importantly, users need to indicate which columns in the ‘input.case’ data object contain maternal SNPs, and which contain child SNPs.
pp.list <- preprocess.genetic.data(case.genetic.data = input.case,
complement.genetic.data = input.control,
ld.block.vec = ld.block.vec,
child.snps = c(1:6, 13:18, 25:30, 37:42),
mother.snps = c(7:12, 19:24, 31:36, 43:48))
Once the input data has been properly formatted and pre-processed, GADGETS should be run exactly as described in the ‘GADGETS’ vignette. We encourage users to read through that vignette carefully for full details on how to implement the method and how to interpret results. Below, we show the computational commands, but do not comment extensively on the underlying intuition.
For this small example, we will search for interactions among 3 and 4 genetic loci. Note that, in the results returned, GADGETS may nominate as potentially interacting SNP-sets composed of only maternal SNPs, only child SNPs, or a combination. A nominated SNP-set containing only maternal SNPs suggests epistatic maternally-mediated effects, a set of only child loci suggests child-SNP epistasis, and a SNP-set containing both maternal and child SNPs could suggest a maternal-fetal interaction.
run.gadgets(pp.list, n.chromosomes = 5, chromosome.size = 3,
results.dir = "size3_res", cluster.type = "interactive",
registryargs = list(file.dir = "size3_reg", seed = 1300),
n.islands = 8, island.cluster.size = 4,
n.migrations = 2)
run.gadgets(pp.list, n.chromosomes = 5, chromosome.size = 4,
results.dir = "size4_res", cluster.type = "interactive",
registryargs = list(file.dir = "size4_reg", seed = 1400),
n.islands = 8, island.cluster.size = 4,
n.migrations = 2)
Now we condense the results using the function combine.islands
. Note here that we need to provide annotations to match the
data input to the preprocess.genetic.data
function, so each SNP will appear twice in the annotations (once for the child, and once for the mother).
snp.anno <- snp.annotations.mci[c(rep(1:6, 2),
rep(7:12, 2),
rep(13:18, 2),
rep(19:24, 2)), ]
size3.combined.res <- combine.islands("size3_res", snp.anno,
pp.list)
size4.combined.res <- combine.islands("size4_res", snp.anno,
pp.list)
Then we check the results for SNP-sets of three elements:
library(magrittr)
library(knitr)
library(kableExtra)
kable(head(size3.combined.res)) %>%
kable_styling() %>%
scroll_box(width = "750px")
snp1 | snp2 | snp3 | snp1.rsid | snp2.rsid | snp3.rsid | snp1.risk.allele | snp2.risk.allele | snp3.risk.allele | snp1.diff.vec | snp2.diff.vec | snp3.diff.vec | snp1.allele.copies | snp2.allele.copies | snp3.allele.copies | fitness.score | n.cases.risk.geno | n.comps.risk.geno | chromosome | n.islands.found |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
12 | 18 | 30 | rs2296831 | rs10834232 | rs4766312 | T | C | C | 0.4293653 | 0.4619761 | 0.4423246 | 1+ | 1+ | 1+ | 4.813884 | 64 | 0 | 12.18.30 | 8 |
In this case, we see that input SNPs 12, 18, and 30 have been nominated as potentially interacting. These are the true risk-related SNPs: SNP 12 corresponds to the genotypes for maternal SNP 6 in the ‘mom.mci’ object, SNP 18 corresponds to SNP 12 in the ‘case.mci’ object, and SNP 30 corresponds to SNP 18 in the ‘case.mci’ object. In real, or larger, applications, we anticipate GADGETS will nominate multiple distinct SNP-sets for further follow-up study.
The post-hoc ‘global test’ of association, demonstrated in the ‘GADGETS’ vignette (function global.test
), can also be applied to analyses that include maternal SNPs using the same commands. We do not repeat those here. When including maternal SNPs in the analysis, however, a small p-value from the global test could reflect the presence risk-related maternal SNPs, child SNPs, or both. Likewise, the ‘epistasis test’, also demonstrated in the ‘GADGETS’ vignette (function epistasis.test
), can also be applied to probe evidence for interaction effects among the component SNPs of specific SNP-sets. For details on how to interpret results from the epistasis test in an analysis that includes maternal SNPs, users should refer to Nodzenski et al. (2023). Note that the software will report ‘p-values’ from epistasis.test
, whereas Nodzenski* et al. (2022) and Nodzenski et al. (2023) report those instead as ‘h-values’. We use the term ‘h-value’ when epistasis.test
is run using the same data as GADGETS because, even under a no-epistasis null hypothesis, the distribution of those ‘h-values’ will not be uniform. On the other hand, if epistasis.test
is run on data independent from that used by GADGETS, the returned p-values can be used for valid hypothesis tests. In either case, a smaller value suggests more inconsistency with the no-epistasis (no-interaction) null hypothesis. Regardless, we do not demonstrate that specific functionality here and refer users to the ‘GADGETS’ vignette.
For SNP-sets that contain both maternal SNPs and child SNPs, where no maternal-SNP/child-SNP pair is located on the same nominal chromosome, we also offer an updated version of the epistasis test that is designed to reflect evidence for maternal-fetal interactions specifically. We demonstrate its usage below. Like the original epistasis test, the procedure is permutation-based, but the permutes are conducted in such a way that only joint maternal-fetal effects will be destroyed, and any other types of effects present in the SNP-set will be preserved. The caveats in interpretation for the epistasis test listed above still apply, particularly with respect to whether the resulting ‘p-value’ can be used for valid hypothesis testing. Here, because we have chosen the SNP-set to be tested based on GADGETS results and thereafter run the test on that same data, the software ‘p-value’ is actually an ‘h-value’.
We carry out the test as follows:
top.snps <- as.vector(t(size3.combined.res[1, 1:3]))
set.seed(10)
epi.test.res <- epistasis.test(top.snps, pp.list, maternal.fetal.test = TRUE)
epi.test.res$pval
## [1] 9.999e-05
The test indicates the observed fitness score is unusual under the assumption of no maternal-fetal interaction effects, consistent with the fact that we are applying the test to a SNP-set with a simulated maternal-fetal interaction effect. Users will receive a warning if this test is applied to a SNP-set where at least one maternal-SNP/child-SNP pairs located on the same chromosome, and a p-value of NA will be returned.
The same network graphics demonstrated in the ‘GADGETS’ vignette are applicable here, with the same interpretation. However, we note that maternal SNPs are now represented by squares, while child SNPs remain circles. The shapes of the child SNP symbols is given by the first element in the vector passed to argument node.shape
and the shapes of the maternal SNP symbols is the second element.
obs.res.list <- list(size3.combined.res, size4.combined.res)
set.seed(10)
graphical.scores <- compute.graphical.scores(obs.res.list, pp.list)
network.plot(graphical.scores, pp.list, graph.area = 200,
node.size = 40, vertex.label.cex = 2,
node.shape = c("circle", "square"))