Skip to content

S-LDSC LDL Analysis

I applied Stratified Linkage Disequilibrium Score Regression (S-LDSC) to summary statistics from a GWAS of LDL from the Million Veterans Program to identify possible key tissue and cell types affecting LDL levels.

Reference Data Sources

I used the reference datasets recommended and preprocessed by the authors of the S-LSDC method. These reference datasets are ultimately drawn from the following data sources:

Results

GTEx and Franke lab tissue expression data

When S-LSDC is applied to GWAS summary statistics using a reference dataset of cell types, S-LSDC will return cell-type \(\tau_i\) coefficients together with associated \(p\)-values. A large coefficient and a small \(p\) value for a given cell type \(i\) suggests that genes related to cell-type \(i\) are over-represented in the heritability of the phenotype of interest.

The graph below shows the coefficient p values for the cell types in the GTEx/Franke Lab dataset when S-LDSC is applied to the LDL GWAS. Cell types are grouped into categories according to the same scheme used in the original S-LDSC paper1.

scatter_plot

Of note: the main 3 statistically significant cell types are all liver cell types. Consistently with the results of MAGMA, this suggests that the liver is central the physiological process determining LDL levels. Note, however, that since both S-LDSC and MAGMA use the GTEx dataset, this is not truly an independent piece of evidence.

There is also one significant kidney cell type (on the right side of the graph). It is unclear to me whether this reflects real biology, or is an artifact.

It is also of interest that several adipose-related cell types are just below the threshold of significance, suggesting that a larger study might implicate adipose tissue in the LDL disease process. This is consistent with the known association of high BMI with LDL levels.

Here are the top associations in tabular form:

Name Coefficient Coefficient_P_value Reject Null
A03.620.Liver 1.5215e-08 0.000211744 True
A11.436.348.Hepatocytes 2.30756e-08 0.000254927 True
Liver 1.82315e-08 0.000335455 True
Kidney_Cortex 9.50672e-09 0.00045596 True
A10.165.114.830.500.750.Subcutaneous.Fat..Abdominal 1.13522e-08 0.00205337 False
A10.165.114.830.750.Subcutaneous.Fat 1.11345e-08 0.00237814 False
A11.497.497.600.Oocytes 7.96733e-09 0.00312142 False
A06.407.071.Adrenal.Glands 8.71681e-09 0.00736218 False
A15.382.490.315.583.Neutrophils 8.68251e-09 0.00957292 False
Pancreas 8.30563e-09 0.00992878 False
A05.360.490.Germ.Cells 6.45252e-09 0.0110824 False
A06.407.071.140.Adrenal.Cortex 7.33265e-09 0.013978 False
A03.556.249.124.Ileum 1.43982e-08 0.0146156 False

Roadmap Chromatin data

I next applied S-LDSC to the LDL GWAS using reference data generated by Finucane et al.1 from the Roadmap Epigenetics Project. This dataset annotates regions of the genome with the epigenetic marks expressed on those regions in particular tissues. These annotations can then be used for S-LSDC. The rationale is that if a GWAS finds a region of the genome to be significantly associated with a trait, and epigenetics shows that that region is being up-regulated in a tissue, the tissue may be important to the trait.

The following graph and table show the result:

scatter

Name Coefficient Coefficient_P_value Reject Null
liver_ENTEX__H3K27ac 1.34601e-07 7.4945e-05 True
Liver__H3K27ac 1.83538e-07 8.36467e-05 True
Liver__H3K4me1 1.36859e-07 0.000236273 True
Adipose_Nuclei__H3K9ac 2.31322e-07 0.000333111 True
Adipose_Nuclei__H3K27ac 9.55665e-08 0.000401196 True
Liver__H3K9ac 3.69786e-07 0.000448776 True
Adipose_Nuclei__H3K4me1 9.55984e-08 0.000469102 True
liver_ENTEX__H3K4me3 4.77665e-07 0.000918027 False
Fetal_Adrenal_Gland__H3K4me3 3.23248e-07 0.00133247 False
Fetal_Intestine_Large__H3K4me1 1.07869e-07 0.0018057 False
Liver__H3K36me3 1.40152e-07 0.00203915 False
Liver__H3K4me3 3.02922e-07 0.00208174 False
Duodenum_Mucosa__H3K4me1 9.94451e-08 0.00223099 False

Note that all the histone modifications (H3K27ac,H3K4me1,H3K9ac, H3K36me3) in the table above are associated with increased gene expression.

Consistent with our findings above, these results point to the liver as a key site of LDL physiology. Unlike the GTEx results above, they also significantly implicate adipose tissue as a site of important physiology.

ImmGen data

The next step is to use the S-LDSC reference data derived from the ImmGen project. This results in no significant tissue-type hits. The top non-significant hits are:

Name Coefficient Coefficient_P_value Reject Null
T.4SP69+.Th 2.09672e-08 0.00229273 False
T.4Nve.Sp 1.61296e-08 0.0091171 False
Mo.6C-IIint.Bl 1.24572e-08 0.0098925 False
T.4Nve.LN 1.37083e-08 0.0120702 False
T.4.LN.BDC 1.47838e-08 0.012287 False
LN.TR.14w.B6 1.06239e-08 0.0138859 False
MF.480int.LV.Naive 8.75148e-09 0.014214 False
MF.480hi.LV.Naive 9.3381e-09 0.0170341 False

This is to be expected, as LDL levels are not primarily the consequence of an autoimmune process.

Corces ATAC-seq data

We can use another immunological reference dataset: the Corces ATAC-seq dataset. In this case, there are also no significant hits. The top non-significant hits are:

Name Coefficient Coefficient_P_value Reject Null
Mono 3.39149e-08 0.236073 False
MEP -8.09368e-09 0.588996 False
Erythro -3.47455e-08 0.694423 False
NK -3.37224e-08 0.753868 False
HSC -2.53803e-08 0.772651 False
CMP -2.52663e-08 0.777761 False
CD4 -4.49983e-08 0.783817 False
CLP -7.29466e-08 0.886496 False

Cahoy and GTEx-Brain data

The next two reference datasets pertain to the central nervous system. Again, there are no significant hits. The top non-significant hits are:

Name Coefficient Coefficient_P_value Reject Null
Astrocyte 1.97982e-09 0.291024 False
Neuron 2.33232e-10 0.466565 False
Oligodendrocyte -5.54881e-09 0.936155 False

and

Name Coefficient Coefficient_P_value Reject Null
Brain_Nucleus_accumbens_(basal_ganglia) 6.56949e-09 0.0297949 False
Brain_Caudate_(basal_ganglia) 4.43273e-09 0.0727096 False
Brain_Putamen_(basal_ganglia) 3.45177e-09 0.0808186 False
Brain_Anterior_cingulate_cortex_(BA24) 1.83247e-09 0.249405 False
Brain_Cortex 1.85239e-09 0.282159 False
Brain_Cerebellar_Hemisphere 1.55714e-09 0.334468 False
Brain_Substantia_nigra 8.98458e-10 0.387369 False
Brain_Cerebellum -9.68139e-10 0.584834 False

This is perhaps to be expected, given the lack of significant hits on any CNS-related categories in the coarse-grained chromatin and gene expression datasets.

Reproducing Analysis

To reproduce this analysis, run the LDL GWAS Analysis Script.


  1. Hilary K Finucane, Yakir A Reshef, Verneri Anttila, Kamil Slowikowski, Alexander Gusev, Andrea Byrnes, Steven Gazal, Po-Ru Loh, Caleb Lareau, Noam Shoresh, and others. Heritability enrichment of specifically expressed genes identifies disease-relevant tissues and cell types. Nature Genetics, 50(4):621–629, 2018. URL: https://pmc.ncbi.nlm.nih.gov/articles/PMC5896795/