Figure System
Plots and tables are important to the presentation of data science results. This section discusses how to add plots and tables (referred to as "figures") to documentation. There are two main approaches:
- The link-based approach
- The programmatic approach
Link-based approach
This is the simplest approach.
With this approach, if you wish to add a figure to a documentation page, simply save the figure as a .png file, upload it to a publicly accessible website (The repository wiki is convenient. Ask a maintainer to be granted edit permission to the wiki), and then embed a reference in your documentation page:
Programmatic approach
This approach is more complex, but more robust.
The components of the programmatic figure system are:
- The list
ALL_FIGURE_TASKSin figure_tasks.py specifies the build-system Tasks that generate programmatic figures. - The script generate_figures.py invokes the build system to generate the assets corresponding to
ALL_FIGURE_TASKS, then copies these figure assets intodocs/_figs1. - The committed manifest
mecfs_bio/figures/figures_manifest.jsonmaps every live figure path (relative todocs/_figs) to the SHA-256 of its contents. The manifest is the source of truth for which figures are part of the project. - The script pull_figures.py reads the manifest and downloads any missing or out-of-date blobs from the figures GitHub release into
docs/_figs. Passprune=Trueto also delete local files not listed in the manifest. - The script push_figures.py hashes the contents of
docs/_figs, updates the manifest, and uploads any blobs not yet present on the release. Each blob is stored on the release as a content-addressed asset (asset name = SHA-256), so pushes never overwrite existing blobs and concurrent updates by different collaborators surface as ordinary git merge conflicts on the manifest. Passprune=Trueto drop manifest entries whose files are no longer present locally. Running this script may require permission from a repository maintainer.
Standard Workflow
Suppose that you have analyzed a genomic dataset and generated figures. You wish to publish these figures and an associated write-up to the project documentation page. Follow these steps:
- Add the Tasks that generate your figures to
ALL_FIGURE_TASKS. - Run
pixi r invoke publish-figuresto generate all figures, copy them todocs/_figsand push them as part of a github release.publish-figureswill also remove any figures not present inALL_FIGURE_TASKS. - Document your analysis by adding a markdown file to
docs/analysis. In your write-up, include your figures by referencing their location indocs/_figs. - Commit the updated
figures_manifest.jsonalong with your other changes. - Create a pull request with your changes (see Standard Workflow).
To remove a figure, delete the corresponding Task from ALL_FIGURE_TASKS, then use publish_figures.py
Advantages
This programmatic approach is more complex than the link-based approach. Nevertheless it has several advantages;
- Figure Lineage: Since figures are represented as build-system Tasks, it is straightforward to determine exactly the datasets and analysis used to produce any given figure. This supports that project principle of automated reproducibility and is consistent with data-science best practices.
- Figure Upgrades: Since figures are represented as build system Tasks, any improvement to that code that generates a particular class of figures can easily be propagated to all figures of that class.
-
The script regenerate_figures.py is similar to
generate_figures, but forces the tasks which create the figures to be rerun. This useful if, for example, one of your plotting Tasks has changed and you wish to propagate this change to all plots generated with this Task. ↩