Skip to content

Project Outline

Overview

The ME/CFS Biostatistics repo is a home for reproducible, open-source analysis of datasets pertaining to ME/CFS and other poorly-understood diseases.

Principles

There are a few key principles that govern how software should be written in this repo.

Automated Reproducibility: Reproducibility is a central tenet of science, but a great deal of scientific data analysis is difficult to reproduce. Even when the requisite data is public, reproducing published analysis can be laborious. In this repo, we mitigate this problem by automating reproducibility: it should be possible to reproduce any of the main analyses with a few lines of Python code.

Automated Testing: Software engineering is "programming integrated over time"1. It thus involves challenges unseen in short-lived programming projects.

One such challenge is the tendency of new features to break previously-developed features. A powerful mitigation against this risk is the use of automated unit and integration tests. Automated tests can verify that old features continue to function as the codebase changes. In this repo, we aim to protect key features with automated testing. See Winters et al.1 for discussion of the principles of software testing.


  1. Titus Winters, Tom Manshreck, and Hyrum Wright. Software engineering at Google: Lessons learned from programming over time. " O'Reilly Media, Inc.", 2020. URL: https://abseil.io/resources/swe-book/html/toc.html