MiXeR
MiXeR234 is a parametric model for the distribution of GWAS effect sizes. The parameters of fitted MiXeR models provide insights into genetic architecture. The three main versions of MiXeR are univariate MiXeR2, bivariate MiXeR3, and GSEA MiXeR4.
Univariate MiXeR
Univariate MiXeR2 models effect sizes in a GWAS of a single trait.
Like LDSC, MiXeR assumes a linear data generating model:
where
- \(y\in\mathbb{R}^N\) is the vector of phenotypes of study participants. \(y\) is normalized to have sample mean 0 and sample variance 1.
- \(G \in \mathbb{R}^{N \times M}\) is the genotype matrix of the study participants.
- \(\beta\in\mathbb{R}^M\) is vector of causal regression coefficients of genetic variants.
- \(g_i \in\mathbb{R}^N\) is the ith column of \(G\). \(g_i\) is normalized to have sample mean 0.
- \(\epsilon\in\mathbb{R}^N\) is the vector of environmental effects.
Similar to LDSC, we view these variables as random. We consider the data of the \(N\) study participants to be independent and identically distributed, and also assume that \(G, \beta,\) and \(\epsilon\) are independent. We assume:
Define:
- The population variance of genetic variant \(i\),
- The sample variance of genetic variant \(i\),
- The population correlation between variants \(i\) and \(j\),
- The sample correlation of variants \(i\) and \(j\),
MiXeR assumes true causal effect sizes follow a mixture distribution. For any variant \(i\), we have
Where:
- \(C\) is a Bernoulli random variable with parameter \(\pi_1 \in (0,1)\).
- \(\mathcal{N}(0,\sigma_\beta^2)\) is a Gaussian distribution with variance \(\sigma_\beta^2>0\), and \(\mathcal{N}(0,0)\) is a Dirac delta distribution1.
- The Bernoulli and Gaussian distributions are independent of one another.
By allowing some variants to have no effect, \((\ref{uni_mixer_core})\) is more general than the commonly used infinitesimal genomic model, which assumes that all variants affect the phenotype.
Next Steps
\((\ref{mixer_dgm})\) and \((\ref{uni_mixer_core})\) specify the core univariate MiXeR model. We now need a strategy to fit this model to data. The next steps are:
- Write this model as a probability distribution over observed GWAS \(z\)-scores of genetic variants.
- Using this probability distribution, derive an efficient algorithm to fit the MiXeR model by maximum likelihood.
Distribution of \(z\)-scores
We begin by deriving an expression for \(\hat{\beta}_i\), the univariate regression coefficient of variant \(i\).
Next, we compute the standard error of \(\hat{\beta_i}\).
where \((\ref{se_step})\) follows since we assume we are studying a polygenic trait, for which no single variant can explain a significant proportion of the variance of the phenotype.
It follows that the \(z\)-score of variant \(i\) is:
\((\ref{mixer_zscore_eq})\) can be used to relate the distribution of \(z\)-scores (observed) to the distribution of the \(\beta_i\) (implied by model parameters).
It is useful to re-group the terms in \((\ref{mixer_zscore_eq})\). Define:
the set of variants in linkage disequilibrium with \(i\).
We have
Note that even though the population correlation \(r_{i,j}=0\) for all variants \(j \notin\mathrm{LD}(i)\), the sample correlation \(\hat r_{i,j}\) will in general be nonzero.
Variance decomposition
Next, let us compute the variance of \(e_i\).
Focusing on the second term,
to be continued
-
\(\beta_i\sim \mathcal{N}(0,0)\) means \(P(\beta=0)=1\). ↩
-
Dominic Holland, Oleksandr Frei, Rahul Desikan, Chun-Chieh Fan, Alexey A Shadrin, Olav B Smeland, Vijay S Sundar, Paul Thompson, Ole A Andreassen, and Anders M Dale. Beyond snp heritability: polygenicity and discoverability of phenotypes estimated with a univariate gaussian mixture model. PLoS Genetics, 16(5):e1008612, 2020. URL: https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1008612. ↩↩↩
-
Oleksandr Frei, Dominic Holland, Olav B Smeland, Alexey A Shadrin, Chun Chieh Fan, Steffen Maeland, Kevin S O’Connell, Yunpeng Wang, Srdjan Djurovic, Wesley K Thompson, and others. Bivariate causal mixture model quantifies polygenic overlap between complex traits beyond genetic correlation. Nature Communications, 10(1):2417, 2019. URL: https://www.nature.com/articles/s41467-019-10310-0. ↩↩
-
Oleksandr Frei, Guy Hindley, Alexey A Shadrin, Dennis van der Meer, Bayram C Akdeniz, Weiqiu Cheng, Kevin S O’Connell, Shahram Bahrami, Nadine Parker, Olav B Smeland, and others. Improved functional mapping with gsa-mixer implicates biologically specific gene-sets and estimates enrichment magnitude. medRxiv, pages 2022–12, 2022. URL: https://www.medrxiv.org/content/10.1101/2022.12.08.22283159v1. ↩↩