mecfs_bio.build_system.task.mixer.mixer_task
Core task for fitting the MiXeR Gaussian mixture model to GWAS data
Classes:
-
BivariateMode– -
MixerDataSource–A source for data for use in Mixer.
-
MixerLDGenerationTask–Implemented by Claude to facilitate testing.
-
MixerTask–Core task to fit the MiXeR Gaussian mixture model to GWAS data
-
PreformattedMixerDataSource–A source for data that is already in MiXeR sumstats format
-
UnivariateMode–
Functions:
-
default_mixer_extract_file_pattern_gen– -
get_mixer_extract_args– -
prepare_mixer_trait_input_file–Prepare a trait sumstats file in the temp dir, ready for MiXeR.
Attributes:
-
CONTAINER_REF_DIR– -
MIXER_CHROM_COL– -
MIXER_EFFECTIVE_SAMPLE_SIZE– -
MIXER_EFFECT_ALLELE_COL– -
MIXER_FIT_JSON_PATTERN– -
MIXER_NON_EFFECT_ALLELE_COL– -
MIXER_POS_COL– -
MIXER_RSID_COL– -
MIXER_TEST_JSON_PATTERN– -
MIXER_Z_SCORE_COL– -
MixerMode– -
logger–
BivariateMode
Attributes:
MixerDataSource
A source for data for use in Mixer. The task should provide a dataframe in gwaslab format, which will be converted to MiXeR format (column renaming + Z = BETA/SE computation).
Attributes:
-
alias(str) – -
asset_id(AssetId) – -
pipe(DataProcessingPipe) – -
sample_info(PhenotypeInfo) – -
task(Task) –
MixerLDGenerationTask
Bases: Task
Implemented by Claude to facilitate testing. Generates .ld files from PLINK .bed/.bim/.fam files using mixer ld command. Copies all source files plus generated .ld files to the output directory.
Methods:
-
execute–
Attributes:
-
bfile_prefix_pattern(str) – -
chromosomes(tuple[int, ...]) – -
deps(list[Task]) – -
ld_window_kb(str) – -
ldscore_r2min(str) – -
meta(Meta) – -
plink_data_task(Task) – -
r2min(str) –
bfile_prefix_pattern
class-attribute
instance-attribute
execute
Source code in mecfs_bio/build_system/task/mixer/mixer_task.py
MixerTask
Bases: Task
Core task to fit the MiXeR Gaussian mixture model to GWAS data
See: Holland, Dominic, et al. "Beyond SNP heritability: Polygenicity and discoverability of phenotypes estimated with a univariate Gaussian mixture model." PLoS Genetics 16.5 (2020): e1008612.
The MiXeR software is distributed via Docker image. Before running MixerTask, verify that you have installed Docker and added yourself to the Docker user group.
The MiXeR authors have split up the genetic variants in their reference panel into 20 random subsets. The recommended MiXeR workflow is to run MiXeR on your GWAS data using each of these 20 random subsets, then combine the results. Specify which of these random subsets to run using the reps_to_perform attribute.
Methods:
Attributes:
-
bim_file_pattern(str) – -
chr_to_use_arg(str | None) – -
deps(list[Task]) – -
extra_args(Sequence[str]) – -
extract_file_pattern_gen(Callable[[int], str] | None) – -
ld_file_pattern(str) – -
meta(Meta) – -
reference_data_directory_task(Task) – -
reps_to_perform(Sequence[int]) – -
threads(int) – -
trait_1_source(MixerDataSource | PreformattedMixerDataSource) –
bim_file_pattern
class-attribute
instance-attribute
ld_file_pattern
class-attribute
instance-attribute
reps_to_perform
class-attribute
instance-attribute
create
classmethod
create(
asset_id: str,
trait_1_source: MixerDataSource
| PreformattedMixerDataSource,
ref_data_directory_task: Task,
extra_args: Sequence[str] = tuple(),
ld_file_pattern: str = "1000G_EUR_Phase3_plink/1000G.EUR.QC.@.run4.ld",
bim_file_pattern: str = "1000G_EUR_Phase3_plink/1000G.EUR.QC.@.bim",
extract_file_pattern_gen: Callable[[int], str]
| None = default_mixer_extract_file_pattern_gen,
threads: int = 4,
reps_to_perform: Sequence[int] = tuple(range(1, 21)),
chr_args: str | None = None,
)
Source code in mecfs_bio/build_system/task/mixer/mixer_task.py
execute
Source code in mecfs_bio/build_system/task/mixer/mixer_task.py
164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 | |
PreformattedMixerDataSource
A source for data that is already in MiXeR sumstats format (RSID, CHR, POS, EffectAllele, OtherAllele, Z, N). No gwaslab-to-mixer column conversion is performed. The task should provide a DirectoryAsset or FileAsset containing the sumstats file.
Attributes:
UnivariateMode
default_mixer_extract_file_pattern_gen
get_mixer_extract_args
get_mixer_extract_args(
extract_file_pattern_gen: Callable[[int], str] | None,
rep: int,
reference_dir_path: Path,
) -> list[str]
Source code in mecfs_bio/build_system/task/mixer/mixer_task.py
prepare_mixer_trait_input_file
prepare_mixer_trait_input_file(
source: MixerDataSource | PreformattedMixerDataSource,
fetch: Fetch,
temp_dir: Path,
) -> Path
Prepare a trait sumstats file in the temp dir, ready for MiXeR.