Figure System
Plots and tables are important to the presentation of data science results. This section discusses how to add plots and tables (referred to as "figures") to documentation. There are two main approaches:
- The link-based approach
- The programmatic approach
Link-based approach
This is the simplest approach.
With this approach, if you wish to add a figure to a documentation page, simply save the figure as a .png file, upload it to a publicly accessible website (The repository wiki is convenient. Ask a maintainer to be granted edit permission to the wiki), and then embed a reference in your documentation page:
Programmatic approach
This approach is more complex, but more robust.
The components of the programmatic figure system are:
- The list
ALL_FIGURE_TASKSin figure_tasks.py specifies the build-system Tasks that generate programmatic figures. - The script generate_figures.py invokes the build system to generate the assets corresponding to
ALL_FIGURE_TASKS, then copies these figure assets intodocs/_figs1. - The script pull_figures.py downloads figures from Github and merges them with the contents of
docs/_figs. - The script push_figures.py uploads the figure assets in
docs/figsto Github as a release. Running this script may require permission from a repository maintainer.
Standard Workflow
Suppose that you have analyzed a genomic dataset and generated figures. You wish to publish these figures and an associated write-up to the project documentation page. Follow these steps:
- Add the Tasks that generate your figures to
ALL_FIGURE_TASKS. - Either run the generate_figures.py script to generate all figures, or write your own script to call the generate_figures function on just your newly added Tasks. In either case, your figures will be copied into
docs/_figs - Document your analysis by adding a markdown file to
docs/analysis. In your write-up, include your figures by referencing their location indocs/_figs. - Upload your figures using push_figures.py.
- Create a pull request with your changes (see Standard Workflow).
Advantages
This programmatic approach is undoubtedly more complex than the link-based approach. Nevertheless, this workflow, in which figures are represented as Tasks, has several advantages;
- Figure Lineage: Since figures are represented as build-system Tasks, it is straightforward to determine exactly the datasets and analysis used to produce any given figure. This supports that project principle of automated reproducibility and is consistent with data-science best practices.
- Figure Upgrades: Since figures are represented as build system Tasks, any improvement to that code that generates a particular class of figures can easily be propagated to all figures of that class.
-
The script regenerate_figures.py is similar to
generate_figures, but forces the tasks which create the figures to be rerun. This useful if, for example, one of your plotting Tasks has changed and you wish to propagate this change to all plots generated with this Task. ↩