mecfs_bio.build_system.task.gene_manhattan_plot_task
Task to produce an interactive gene-level Manhattan plot as an HTML file.
Supports two source types:
- :class:
MagmaGeneSource: read a MAGMA gene-level analysis output directory (the.genes.outfile produced by :class:MagmaGeneAnalysisTask) and join a gene thesaurus to translate Ensembl IDs into human-readable gene names. - :class:
GenePValueTableSource: read an arbitrary table of(gene_ensembl_id, p_value)rows and look up chromosomal locations and human-readable gene names from a gene-locations reference (such asMAGMA_ENSEMBL_GENE_LOCATION_REFERENCE_DATA_BUILD_37_RAW). Intended for rare-variant test output or any other gene-level result table.
The plot uses Plotly's WebGL Scattergl renderer for performance with
20k-30k gene points and exposes hover text containing the gene name, Ensembl
ID, chromosome, genomic midpoint position (labelled Position (hg19) or
Position (hg38) according to the source's declared genome_build), and
-log10(p).
Classes:
-
GeneManhattanPlotTask–Create an interactive HTML gene-level Manhattan plot.
-
GeneManhattanSource–A source that yields rows of (chrom, pos, ensembl_id, gene_name, p) for a Manhattan plot.
-
GenePValueTableSource–Load a Manhattan-plot table from an arbitrary (gene, p-value) table.
-
MagmaGeneSource–Load a Manhattan-plot table from a :class:
MagmaGeneAnalysisTask.
Functions:
-
build_manhattan_plot–Construct a Plotly figure containing a gene-level Manhattan plot.
Attributes:
-
GeneIdKind– -
logger–
GeneManhattanPlotTask
Bases: Task
Create an interactive HTML gene-level Manhattan plot.
Backed by Plotly's WebGL renderer (Scattergl) so that hover stays
responsive at gene-scale point counts (~20k-30k).
Methods:
Attributes:
-
colors(tuple[str, str]) – -
deps(list[Task]) – -
meta(Meta) – -
plotly_js_mode(bool | PlotlyWriteMode) – -
point_size(int) – -
sig_line_color(str) – -
sig_threshold(float | None) – -
source(GeneManhattanSource) – -
title(str | None) –
create
classmethod
create(
asset_id: str,
source: GeneManhattanSource,
sig_threshold: float | None = None,
title: str | None = None,
) -> GeneManhattanPlotTask
Source code in mecfs_bio/build_system/task/gene_manhattan_plot_task.py
execute
Source code in mecfs_bio/build_system/task/gene_manhattan_plot_task.py
GeneManhattanSource
Bases: ABC
A source that yields rows of (chrom, pos, ensembl_id, gene_name, p) for a Manhattan plot.
Methods:
-
load_df–Materialize a pandas DataFrame with columns
chrom,pos,ensembl_id,gene_name,p_value.
Attributes:
-
deps(list[Task]) – -
genome_build(GenomeBuild) –Genome build of the chromosomal positions exposed by
load_df. -
project(str) –The project label inherited from the primary input task's metadata.
-
trait(str) –The trait label inherited from the primary input task's metadata.
genome_build
abstractmethod
property
Genome build of the chromosomal positions exposed by load_df.
Drives the hover-text position label (pos_hg19 vs pos_hg38).
project
abstractmethod
property
The project label inherited from the primary input task's metadata.
trait
abstractmethod
property
The trait label inherited from the primary input task's metadata.
load_df
abstractmethod
Materialize a pandas DataFrame with columns chrom, pos, ensembl_id, gene_name, p_value.
GenePValueTableSource
Bases: GeneManhattanSource
Load a Manhattan-plot table from an arbitrary (gene, p-value) table.
Chromosomal positions and the complementary gene identifier (Ensembl ID or
human-readable gene name) are looked up from gene_locations_task
(e.g. the MAGMA Ensembl gene-locations reference) by inner join. Genes
missing from the locations file are dropped because they cannot be placed
on the x-axis.
gene_id_kind declares which identifier the input table uses in
gene_col. The locations reference must contain a matching column:
Ensembl IDs ("ensembl_id") join on the reference's Ensembl-ID column,
gene symbols ("gene_name") join on the reference's gene-name column.
Methods:
-
load_df–
Attributes:
-
deps(list[Task]) – -
gene_col(str) – -
gene_id_kind(GeneIdKind) – -
gene_locations_task(Task) – -
genome_build(GenomeBuild) – -
p_col(str) – -
project(str) – -
table_task(Task) – -
trait(str) –
load_df
Source code in mecfs_bio/build_system/task/gene_manhattan_plot_task.py
MagmaGeneSource
Bases: GeneManhattanSource
Load a Manhattan-plot table from a :class:MagmaGeneAnalysisTask.
Chromosomal positions come from the MAGMA output itself. Human-readable
gene names are joined in from gene_thesaurus_task by Ensembl ID. When
a gene is missing from the thesaurus, the Ensembl ID is used as the
display name.
Methods:
-
load_df–
Attributes:
-
deps(list[Task]) – -
gene_thesaurus_task(Task) – -
genome_build(GenomeBuild) – -
magma_task(Task) – -
project(str) – -
trait(str) –
load_df
Source code in mecfs_bio/build_system/task/gene_manhattan_plot_task.py
build_manhattan_plot
build_manhattan_plot(
df: DataFrame,
sig_threshold: float | None,
point_size: int,
colors: tuple[str, str],
sig_line_color: str,
title: str | None,
genome_build: GenomeBuild,
) -> go.Figure
Construct a Plotly figure containing a gene-level Manhattan plot.
Genes with non-positive or null p-values are dropped (-log10 is
undefined). If sig_threshold is None, a Bonferroni-corrected
threshold 0.05 / N_genes is used and a dashed horizontal line is drawn
at the corresponding -log10(p).
genome_build selects the hover label for the gene's midpoint position
(Position (hg19) for build 37, Position (hg38) for build 38).
Positions in df are assumed to already be in the declared build.
Source code in mecfs_bio/build_system/task/gene_manhattan_plot_task.py
306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 | |