Skip to content

mecfs_bio.build_system.task.gwaslab.gwaslab_lead_variants_task

Classes:

GwasLabLeadVariantsTask

Bases: Task

A task to generate a list of lead variants from summary statistics. Uses Gwaslab. see: https://cloufield.github.io/gwaslab/utility_get_lead_novel/

Methods:

Attributes:

deps property

deps: list[Task]

meta property

meta: GWASLabLeadVariantsMeta

short_id class-attribute instance-attribute

short_id: AssetId = field(converter=AssetId)

sig_level class-attribute instance-attribute

sig_level: float = 5e-08

execute

execute(scratch_dir: Path, fetch: Fetch, wf: WF) -> Asset
Source code in mecfs_bio/build_system/task/gwaslab/gwaslab_lead_variants_task.py
def execute(self, scratch_dir: Path, fetch: Fetch, wf: WF) -> Asset:
    sumstats_asset = fetch(self._input_asset_id)
    sumstats: gl.Sumstats = read_sumstats(sumstats_asset)
    variant_df = sumstats.get_lead(anno=True, sig_level=self.sig_level)

    """
    GWASLab (sumstats.get_lead()) does not add the GENE column if no significant variants are found.
    See: https://github.com/Cloufield/gwaslab/blob/eb4daf636f6e171d74a80c16723293734e851ee2/src/gwaslab/util/util_in_get_sig.py#L164-L177. 
    However, the GENE column is required by downstream tasks.
    """
    if not "GENE" in variant_df.columns:
        variant_df["GENE"] = None

    out_path = scratch_dir / "lead_variants.csv"
    variant_df.to_csv(out_path, index=False)
    return FileAsset(out_path)