S-LDSC LDL Analysis
I applied Stratified Linkage Disequilibrium Score Regression (S-LDSC) to summary statistics from a GWAS of LDL from the Million Veterans Program to identify possible key tissue and cell types affecting LDL levels.
Reference Data Sources
I used the reference datasets recommended and preprocessed by the authors of the S-LSDC method. These reference datasets are ultimately drawn from the following data sources:
- The GTEx Project
- The Franke lab dataset
- The Roadmap Epigenetic Project
- The Corces et al. ATAC-seq dataset of 13 blood cell types.
- The ImmGen Project
- The Cahoy Mouse Central Nervous System Dataset
Results
GTEx and Franke lab tissue expression data
When S-LSDC is applied to GWAS summary statistics using a reference dataset of cell types, S-LSDC will return cell-type \(\tau_i\) coefficients together with associated \(p\)-values. A large coefficient and a small \(p\) value for a given cell type \(i\) suggests that genes related to cell-type \(i\) are over-represented in the heritability of the phenotype of interest.
The graph below shows the coefficient p values for the cell types in the GTEx/Franke Lab dataset when S-LDSC is applied to the LDL GWAS. Cell types are grouped into categories according to the same scheme used in the original S-LDSC paper1.
Of note: the main 3 statistically significant cell types are all liver cell types. Consistently with the results of MAGMA, this suggests that the liver is central the physiological process determining LDL levels. Note, however, that since both S-LDSC and MAGMA use the GTEx dataset, this is not truly an independent piece of evidence.
There is also one significant kidney cell type (on the right side of the graph). It is unclear to me whether this reflects real biology, or is an artifact.
It is also of interest that several adipose-related cell types are just below the threshold of significance, suggesting that a larger study might implicate adipose tissue in the LDL disease process. This is consistent with the known association of high BMI with LDL levels.
Here are the top associations in tabular form:
| Name | Coefficient | Coefficient_P_value | Reject Null |
|---|---|---|---|
| A03.620.Liver | 1.5215e-08 | 0.000211744 | True |
| A11.436.348.Hepatocytes | 2.30756e-08 | 0.000254927 | True |
| Liver | 1.82315e-08 | 0.000335455 | True |
| Kidney_Cortex | 9.50672e-09 | 0.00045596 | True |
| A10.165.114.830.500.750.Subcutaneous.Fat..Abdominal | 1.13522e-08 | 0.00205337 | False |
| A10.165.114.830.750.Subcutaneous.Fat | 1.11345e-08 | 0.00237814 | False |
| A11.497.497.600.Oocytes | 7.96733e-09 | 0.00312142 | False |
| A06.407.071.Adrenal.Glands | 8.71681e-09 | 0.00736218 | False |
| A15.382.490.315.583.Neutrophils | 8.68251e-09 | 0.00957292 | False |
| Pancreas | 8.30563e-09 | 0.00992878 | False |
| A05.360.490.Germ.Cells | 6.45252e-09 | 0.0110824 | False |
| A06.407.071.140.Adrenal.Cortex | 7.33265e-09 | 0.013978 | False |
| A03.556.249.124.Ileum | 1.43982e-08 | 0.0146156 | False |
Roadmap Chromatin data
I next applied S-LDSC to the LDL GWAS using reference data generated by Finucane et al.1 from the Roadmap Epigenetics Project. This dataset annotates regions of the genome with the epigenetic marks expressed on those regions in particular tissues. These annotations can then be used for S-LSDC. The rationale is that if a GWAS finds a region of the genome to be significantly associated with a trait, and epigenetics shows that that region is being up-regulated in a tissue, the tissue may be important to the trait.
The following graph and table show the result:
| Name | Coefficient | Coefficient_P_value | Reject Null |
|---|---|---|---|
| liver_ENTEX__H3K27ac | 1.34601e-07 | 7.4945e-05 | True |
| Liver__H3K27ac | 1.83538e-07 | 8.36467e-05 | True |
| Liver__H3K4me1 | 1.36859e-07 | 0.000236273 | True |
| Adipose_Nuclei__H3K9ac | 2.31322e-07 | 0.000333111 | True |
| Adipose_Nuclei__H3K27ac | 9.55665e-08 | 0.000401196 | True |
| Liver__H3K9ac | 3.69786e-07 | 0.000448776 | True |
| Adipose_Nuclei__H3K4me1 | 9.55984e-08 | 0.000469102 | True |
| liver_ENTEX__H3K4me3 | 4.77665e-07 | 0.000918027 | False |
| Fetal_Adrenal_Gland__H3K4me3 | 3.23248e-07 | 0.00133247 | False |
| Fetal_Intestine_Large__H3K4me1 | 1.07869e-07 | 0.0018057 | False |
| Liver__H3K36me3 | 1.40152e-07 | 0.00203915 | False |
| Liver__H3K4me3 | 3.02922e-07 | 0.00208174 | False |
| Duodenum_Mucosa__H3K4me1 | 9.94451e-08 | 0.00223099 | False |
Note that all the histone modifications (H3K27ac,H3K4me1,H3K9ac, H3K36me3) in the table above are associated with increased gene expression.
Consistent with our findings above, these results point to the liver as a key site of LDL physiology. Unlike the GTEx results above, they also significantly implicate adipose tissue as a site of important physiology.
ImmGen data
The next step is to use the S-LDSC reference data derived from the ImmGen project. This results in no significant tissue-type hits. The top non-significant hits are:
| Name | Coefficient | Coefficient_P_value | Reject Null |
|---|---|---|---|
| T.4SP69+.Th | 2.09672e-08 | 0.00229273 | False |
| T.4Nve.Sp | 1.61296e-08 | 0.0091171 | False |
| Mo.6C-IIint.Bl | 1.24572e-08 | 0.0098925 | False |
| T.4Nve.LN | 1.37083e-08 | 0.0120702 | False |
| T.4.LN.BDC | 1.47838e-08 | 0.012287 | False |
| LN.TR.14w.B6 | 1.06239e-08 | 0.0138859 | False |
| MF.480int.LV.Naive | 8.75148e-09 | 0.014214 | False |
| MF.480hi.LV.Naive | 9.3381e-09 | 0.0170341 | False |
This is to be expected, as LDL levels are not primarily the consequence of an autoimmune process.
Corces ATAC-seq data
We can use another immunological reference dataset: the Corces ATAC-seq dataset. In this case, there are also no significant hits. The top non-significant hits are:
| Name | Coefficient | Coefficient_P_value | Reject Null |
|---|---|---|---|
| Mono | 3.39149e-08 | 0.236073 | False |
| MEP | -8.09368e-09 | 0.588996 | False |
| Erythro | -3.47455e-08 | 0.694423 | False |
| NK | -3.37224e-08 | 0.753868 | False |
| HSC | -2.53803e-08 | 0.772651 | False |
| CMP | -2.52663e-08 | 0.777761 | False |
| CD4 | -4.49983e-08 | 0.783817 | False |
| CLP | -7.29466e-08 | 0.886496 | False |
Cahoy and GTEx-Brain data
The next two reference datasets pertain to the central nervous system. Again, there are no significant hits. The top non-significant hits are:
| Name | Coefficient | Coefficient_P_value | Reject Null |
|---|---|---|---|
| Astrocyte | 1.97982e-09 | 0.291024 | False |
| Neuron | 2.33232e-10 | 0.466565 | False |
| Oligodendrocyte | -5.54881e-09 | 0.936155 | False |
and
| Name | Coefficient | Coefficient_P_value | Reject Null |
|---|---|---|---|
| Brain_Nucleus_accumbens_(basal_ganglia) | 6.56949e-09 | 0.0297949 | False |
| Brain_Caudate_(basal_ganglia) | 4.43273e-09 | 0.0727096 | False |
| Brain_Putamen_(basal_ganglia) | 3.45177e-09 | 0.0808186 | False |
| Brain_Anterior_cingulate_cortex_(BA24) | 1.83247e-09 | 0.249405 | False |
| Brain_Cortex | 1.85239e-09 | 0.282159 | False |
| Brain_Cerebellar_Hemisphere | 1.55714e-09 | 0.334468 | False |
| Brain_Substantia_nigra | 8.98458e-10 | 0.387369 | False |
| Brain_Cerebellum | -9.68139e-10 | 0.584834 | False |
This is perhaps to be expected, given the lack of significant hits on any CNS-related categories in the coarse-grained chromatin and gene expression datasets.
Reproducing Analysis
To reproduce this analysis, run the LDL GWAS Analysis Script.
-
Hilary K Finucane, Yakir A Reshef, Verneri Anttila, Kamil Slowikowski, Alexander Gusev, Andrea Byrnes, Steven Gazal, Po-Ru Loh, Caleb Lareau, Noam Shoresh, and others. Heritability enrichment of specifically expressed genes identifies disease-relevant tissues and cell types. Nature Genetics, 50(4):621–629, 2018. URL: https://pmc.ncbi.nlm.nih.gov/articles/PMC5896795/. ↩↩