Bases: Task
A task to generate a list of lead variants from summary statistics.
Uses Gwaslab.
see: https://cloufield.github.io/gwaslab/utility_get_lead_novel/
Methods:
Attributes:
meta: GWASLabLeadVariantsMeta
short_id
class-attribute
instance-attribute
short_id: AssetId = field(converter=AssetId)
sig_level
class-attribute
instance-attribute
execute
execute(scratch_dir: Path, fetch: Fetch, wf: WF) -> Asset
Source code in mecfs_bio/build_system/task/gwaslab/gwaslab_lead_variants_task.py
| def execute(self, scratch_dir: Path, fetch: Fetch, wf: WF) -> Asset:
sumstats_asset = fetch(self._input_asset_id)
sumstats: gl.Sumstats = read_sumstats(sumstats_asset)
variant_df = sumstats.get_lead(anno=True, sig_level=self.sig_level)
"""
GWASLab (sumstats.get_lead()) does not add the GENE column if no significant variants are found.
See: https://github.com/Cloufield/gwaslab/blob/eb4daf636f6e171d74a80c16723293734e851ee2/src/gwaslab/util/util_in_get_sig.py#L164-L177.
However, the GENE column is required by downstream tasks.
"""
if not "GENE" in variant_df.columns:
variant_df["GENE"] = None
out_path = scratch_dir / "lead_variants.csv"
variant_df.to_csv(out_path, index=False)
return FileAsset(out_path)
|