S-LDSC LDL Analysis

I applied Stratified Linkage Disequilibrium Score Regression (S-LDSC) to summary statistics from a GWAS of LDL from the Million Veterans Program to identify possible key tissue and cell types affecting LDL levels.

Reference Data Sources

I used the reference datasets recommended and preprocessed by the authors of the S-LSDC method. These reference datasets are ultimately drawn from the following data sources:

The GTEx Project
The Franke lab dataset
The Roadmap Epigenetic Project
The Corces et al. ATAC-seq dataset of 13 blood cell types.
The ImmGen Project
The Cahoy Mouse Central Nervous System Dataset

Results

GTEx and Franke lab tissue expression data

When S-LSDC is applied to GWAS summary statistics using a reference dataset of cell types, S-LSDC will return cell-type \(\tau_i\) coefficients together with associated \(p\)-values. A large coefficient and a small \(p\) value for a given cell type \(i\) suggests that genes related to cell-type \(i\) are over-represented in the heritability of the phenotype of interest.

The graph below shows the coefficient p values for the cell types in the GTEx/Franke Lab dataset when S-LDSC is applied to the LDL GWAS. Cell types are grouped into categories according to the same scheme used in the original S-LDSC paper¹.

Of note: the main 3 statistically significant cell types are all liver cell types. Consistently with the results of MAGMA, this suggests that the liver is central the physiological process determining LDL levels. Note, however, that since both S-LDSC and MAGMA use the GTEx dataset, this is not truly an independent piece of evidence.

There is also one significant kidney cell type (on the right side of the graph). It is unclear to me whether this reflects real biology, or is an artifact.

It is also of interest that several adipose-related cell types are just below the threshold of significance, suggesting that a larger study might implicate adipose tissue in the LDL disease process. This is consistent with the known association of high BMI with LDL levels.

Here are the top associations in tabular form:

Name	Coefficient	Coefficient_P_value	Reject Null
A03.620.Liver	1.5215e-08	0.000211744	True
A11.436.348.Hepatocytes	2.30756e-08	0.000254927	True
Liver	1.82315e-08	0.000335455	True
Kidney_Cortex	9.50672e-09	0.00045596	True
A10.165.114.830.500.750.Subcutaneous.Fat..Abdominal	1.13522e-08	0.00205337	False
A10.165.114.830.750.Subcutaneous.Fat	1.11345e-08	0.00237814	False
A11.497.497.600.Oocytes	7.96733e-09	0.00312142	False
A06.407.071.Adrenal.Glands	8.71681e-09	0.00736218	False
A15.382.490.315.583.Neutrophils	8.68251e-09	0.00957292	False
Pancreas	8.30563e-09	0.00992878	False
A05.360.490.Germ.Cells	6.45252e-09	0.0110824	False
A06.407.071.140.Adrenal.Cortex	7.33265e-09	0.013978	False
A03.556.249.124.Ileum	1.43982e-08	0.0146156	False

Roadmap Chromatin data

I next applied S-LDSC to the LDL GWAS using reference data generated by Finucane et al.¹ from the Roadmap Epigenetics Project. This dataset annotates regions of the genome with the epigenetic marks expressed on those regions in particular tissues. These annotations can then be used for S-LSDC. The rationale is that if a GWAS finds a region of the genome to be significantly associated with a trait, and epigenetics shows that that region is being up-regulated in a tissue, the tissue may be important to the trait.

The following graph and table show the result:

Name	Coefficient	Coefficient_P_value	Reject Null
liver_ENTEX__H3K27ac	1.34601e-07	7.4945e-05	True
Liver__H3K27ac	1.83538e-07	8.36467e-05	True
Liver__H3K4me1	1.36859e-07	0.000236273	True
Adipose_Nuclei__H3K9ac	2.31322e-07	0.000333111	True
Adipose_Nuclei__H3K27ac	9.55665e-08	0.000401196	True
Liver__H3K9ac	3.69786e-07	0.000448776	True
Adipose_Nuclei__H3K4me1	9.55984e-08	0.000469102	True
liver_ENTEX__H3K4me3	4.77665e-07	0.000918027	False
Fetal_Adrenal_Gland__H3K4me3	3.23248e-07	0.00133247	False
Fetal_Intestine_Large__H3K4me1	1.07869e-07	0.0018057	False
Liver__H3K36me3	1.40152e-07	0.00203915	False
Liver__H3K4me3	3.02922e-07	0.00208174	False
Duodenum_Mucosa__H3K4me1	9.94451e-08	0.00223099	False

Note that all the histone modifications (H3K27ac,H3K4me1,H3K9ac, H3K36me3) in the table above are associated with increased gene expression.

Consistent with our findings above, these results point to the liver as a key site of LDL physiology. Unlike the GTEx results above, they also significantly implicate adipose tissue as a site of important physiology.

ImmGen data

The next step is to use the S-LDSC reference data derived from the ImmGen project. This results in no significant tissue-type hits. The top non-significant hits are:

Name	Coefficient	Coefficient_P_value	Reject Null
T.4SP69+.Th	2.09672e-08	0.00229273	False
T.4Nve.Sp	1.61296e-08	0.0091171	False
Mo.6C-IIint.Bl	1.24572e-08	0.0098925	False
T.4Nve.LN	1.37083e-08	0.0120702	False
T.4.LN.BDC	1.47838e-08	0.012287	False
LN.TR.14w.B6	1.06239e-08	0.0138859	False
MF.480int.LV.Naive	8.75148e-09	0.014214	False
MF.480hi.LV.Naive	9.3381e-09	0.0170341	False

This is to be expected, as LDL levels are not primarily the consequence of an autoimmune process.

Corces ATAC-seq data

We can use another immunological reference dataset: the Corces ATAC-seq dataset. In this case, there are also no significant hits. The top non-significant hits are:

Name	Coefficient	Coefficient_P_value	Reject Null
Mono	3.39149e-08	0.236073	False
MEP	-8.09368e-09	0.588996	False
Erythro	-3.47455e-08	0.694423	False
NK	-3.37224e-08	0.753868	False
HSC	-2.53803e-08	0.772651	False
CMP	-2.52663e-08	0.777761	False
CD4	-4.49983e-08	0.783817	False
CLP	-7.29466e-08	0.886496	False

Cahoy and GTEx-Brain data

The next two reference datasets pertain to the central nervous system. Again, there are no significant hits. The top non-significant hits are:

Name	Coefficient	Coefficient_P_value	Reject Null
Astrocyte	1.97982e-09	0.291024	False
Neuron	2.33232e-10	0.466565	False
Oligodendrocyte	-5.54881e-09	0.936155	False

and

Name	Coefficient	Coefficient_P_value	Reject Null
Brain_Nucleus_accumbens_(basal_ganglia)	6.56949e-09	0.0297949	False
Brain_Caudate_(basal_ganglia)	4.43273e-09	0.0727096	False
Brain_Putamen_(basal_ganglia)	3.45177e-09	0.0808186	False
Brain_Anterior_cingulate_cortex_(BA24)	1.83247e-09	0.249405	False
Brain_Cortex	1.85239e-09	0.282159	False
Brain_Cerebellar_Hemisphere	1.55714e-09	0.334468	False
Brain_Substantia_nigra	8.98458e-10	0.387369	False
Brain_Cerebellum	-9.68139e-10	0.584834	False

This is perhaps to be expected, given the lack of significant hits on any CNS-related categories in the coarse-grained chromatin and gene expression datasets.

Reproducing Analysis

To reproduce this analysis, run the LDL GWAS Analysis Script.

Hilary K Finucane, Yakir A Reshef, Verneri Anttila, Kamil Slowikowski, Alexander Gusev, Andrea Byrnes, Steven Gazal, Po-Ru Loh, Caleb Lareau, Noam Shoresh, and others. Heritability enrichment of specifically expressed genes identifies disease-relevant tissues and cell types. Nature Genetics, 50(4):621–629, 2018. URL: https://pmc.ncbi.nlm.nih.gov/articles/PMC5896795/. ↩↩