Skip to content

S-LDSC

I applied Stratified Linkage Disequilibrium Score Regression1 (S-LDSC) to summary statistics from Willer et al.'s GWAS of LDL2 to identify possible key tissue and cell types affecting LDL levels.

Reference Data Sources

I used the reference datasets recommended and preprocessed by the authors of the S-LSDC method1. These reference datasets are ultimately drawn from the following data sources:

Results

GTEx and Franke lab tissue expression data

When S-LSDC is applied to GWAS summary statistics using a reference dataset of cell types, S-LSDC will return cell-type \(\tau_i\) coefficients together with associated \(p\) values. A large coefficient and a small \(p\) value for a given cell type \(i\) suggests that genes related to cell-type \(i\) are over-represented in the heritability of the phenotype of interest.

The graph below shows the coefficient p values for the cell types in the GTEx/Franke Lab dataset when S-LDSC is applied to the LDL GWAS. Cell types are grouped into categories according to the same scheme used in the original S-LDSC paper1.

Results of application of S-LDSC to Willer et al.'s LDL GWAS using the Franke lab/ GTEx dataset. Points are colored according to broad tissue category. Large points correspond to cell/tissue types deemed significant by an application of the Benjamini-Hochberg procedure at an FDR of 0.01.

Of note: all three significant cell/tissue types are liver-related, consistent with the known biology of LDL3.

Roadmap Chromatin data

I next applied S-LDSC to the LDL GWAS using reference data generated by Finucane et al.1 from the Roadmap Epigenetics Project. This dataset annotates regions of the genome with the epigenetic marks expressed on those regions in particular tissues. These annotations can then be used for S-LSDC. The rationale is that if a GWAS finds a region of the genome to be significantly associated with a trait, and epigenetics shows that that region is being up-regulated in a tissue, the tissue may be important to the trait.

Results of application of S-LDSC to Willer et al.'s LDL GWAS using the epigenetics reference dataset. Points are colored according to broad tissue category. Large points correspond to cell/tissue types deemed significant by an application of the Benjamini-Hochberg procedure at an FDR of 0.01.

Again, we find liver-related tissues and cell types, consistent with the known biology of LDL.

ImmGen data

The next step is to use the S-LDSC reference data derived from the ImmGen project. There were no significant cell types with this dataset.

Corces ATAC-seq data

The next step is the use the Coreces ATAC-seq data as a reference. There were no significant cell types with this dataset

Cahoy and GTEx-Brain data

The remaining two datasets pertain to the central nervous system. There are no significant cell types with either of these datasets. This is consistent with LDL being a largely non-neurological trait.


  1. Hilary K Finucane, Yakir A Reshef, Verneri Anttila, Kamil Slowikowski, Alexander Gusev, Andrea Byrnes, Steven Gazal, Po-Ru Loh, Caleb Lareau, Noam Shoresh, and others. Heritability enrichment of specifically expressed genes identifies disease-relevant tissues and cell types. Nature Genetics, 50(4):621–629, 2018. URL: https://pmc.ncbi.nlm.nih.gov/articles/PMC5896795/

  2. Global Lipids Genetics Consortium. Discovery and refinement of loci associated with lipid levels. Nature Genetics, 45(11):1274–1283, 2013. URL: https://pmc.ncbi.nlm.nih.gov/articles/PMC3838666/

  3. Daniel Steinberg. The cholesterol wars: the skeptics vs the preponderance of evidence. Elsevier, 2011. URL: https://www.amazon.com/dp/B0085TMWZ4