Multi-trait fine-mapping
Ville Karhunen
07.11.2024
multitrait.Rmd
This is a short tutorial for multi-trait fine-mapping. Please note that the multi-trait implementation is under constant development – moreover, the current implementation is tested for only two traits, and other number of traits may return an error.
Please report any unexpected behaviour to: ville dot karhunen at mrc-bsu.cam.ac.uk.
Example data
The example data are summary statistics for two simulated traits for 842 variants within 50,000 individuals. Both traits have two causal variants, one of which is shared with both traits.
load(url("https://github.com/vkarhune/finimomData/raw/main/multi-finimom_example.RData"))
ls()
#> [1] "causals_y1" "causals_y2" "LDmat" "mafs"
#> [5] "summarystats_y1" "summarystats_y2"
# summary statistics
head(summarystats_y1)
#> beta se
#> [1,] -0.012283147 0.010012730
#> [2,] -0.022047978 0.011903286
#> [3,] -0.007153627 0.010388402
#> [4,] 0.002569805 0.007193564
#> [5,] 0.002972988 0.016465898
#> [6,] -0.012660130 0.009985195
head(summarystats_y2)
#> beta se
#> [1,] 0.0061683619 0.010012843
#> [2,] -0.0478840719 0.011901768
#> [3,] -0.0495203583 0.010386090
#> [4,] 0.0005496849 0.007193573
#> [5,] -0.0352351055 0.016465150
#> [6,] 0.0067814593 0.009985310
# causal variants
causals_y1
#> [1] 201 318
causals_y2
#> [1] 201 590
The linkage disequilibrium (LD) reference used is an out-of-sample reference, calculated based on 10,000 individuals. The allele frequencies for the variants are provided in object ‘mafs’.
Fine-mapping
The input data for multi-trait fine-mapping are given as lists:
# betas
beta <- list(summarystats_y1[,"beta"], summarystats_y2[,"beta"])
# standard errors
se <- list(summarystats_y1[,"se"], summarystats_y2[,"se"])
Multi-trait fine-mapping can be run as follows:
res <- multi_finimom(beta = beta, se = se, eaf = mafs, R = LDmat,
n = 50000, k = 2, omega = NULL,
niter = 62500, burnin = 12500,
standardize = TRUE,
verbose = TRUE,
insampleLD = FALSE,
clump = TRUE, clump_r2 = 0.995^2, check_ld = TRUE,
u = 1.05,
purity = 0.5)
#> Clumping variants at r2=0.99
#> Sampling from the posterior...
#>
#> 62500 iterations done in 20.68 seconds
res$sets
#> [[1]]
#> [[1]][[1]]
#> [1] 260 281 318 420 609 620 689
#>
#> [[1]][[2]]
#> [1] 84 201 288
#>
#>
#> [[2]]
#> [[2]][[1]]
#> [1] 590 598 603 645 653 671
#>
#> [[2]][[2]]
#> [1] 84 201 288
Choosing the tuning parameter u
Multi-FiniMOM requires the tuning parameter u as an input. This parameter controls the prior distribution for the number of causal variants. In particular, u can be used to counter a potential mismatch between the summary statistics and the LD matrix, in that larger values of u protect against false positives when using an out-of-sample LD reference.
In the example, we have used u=1.05. However, in the presence of severe LD mismatch (particularly if using a small LD reference panel, such as 1000Genomes), one might require u values up to 20 or 30 to provide reasonable fine-mapping results. A rough rule-of-thumb is to increase u until there are no credible sets for which for all of the variants within the set.
It should be noted that due to the different model formulation of the univariate and multi-trait implementations, the u values for the univariate and multi-trait fine-mapping are not directly comparable.
Session information
sessionInfo()
#> R version 4.4.2 (2024-10-31)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 22.04.5 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so; LAPACK version 3.10.0
#>
#> locale:
#> [1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
#> [4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
#> [7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
#> [10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] finimom_0.2.0
#>
#> loaded via a namespace (and not attached):
#> [1] digest_0.6.37 desc_1.4.3 R6_2.5.1 fastmap_1.2.0
#> [5] xfun_0.49 cachem_1.1.0 knitr_1.48 htmltools_0.5.8.1
#> [9] rmarkdown_2.29 lifecycle_1.0.4 cli_3.6.3 sass_0.4.9
#> [13] pkgdown_2.1.1 textshaping_0.4.0 jquerylib_0.1.4 systemfonts_1.1.0
#> [17] compiler_4.4.2 highr_0.11 tools_4.4.2 ragg_1.3.3
#> [21] evaluate_1.0.1 bslib_0.8.0 Rcpp_1.0.13-1 yaml_2.3.10
#> [25] jsonlite_1.8.9 rlang_1.1.4 fs_1.6.5