A case for Open Code

I’ve mentioned openly sharing code as a part of Reproducible Analytical Pipelines, several times already, but I suspect it’s the point to be met with most resistance and so I want to dive into it a bit more.

Why publish your code?

Put from the perspective of good practice as the default, can you justify not publishing your code? Here are the main benefits of Open Code:

  • It’s more efficient - Sharing and reusing code reduces the duplication and burden of work. Changes may be needed for specific audiences, but how many of the same data tasks are being carried out again and again across national, regional and local healthcare teams?
  • It encourages good practice - If you know others can see your work you’re more likely to care about it being good work, and thus end up with a healthier and more maintainable code base.
  • It’s more transparent - If you’re publicly-funded and working with the public’s data, why shouldn’t your methodologies be available publicly for others to check, reuse and build on?
  • It’s good for collaboration - we can learn from each other, both across departments and from external users.

But Open Code isn’t beneficial to me personally!

Incorrect! Sorry, I just really want to reiterate those points. If you write something to be reproducible by someone else, it’s going to be more reproducible by you in the future, or there may be less need for you to do it. If you open yourself to, yes - criticism, you open yourself to help and improvement. If you engage with the open-source community you can and will learn, challenge and teach, growing your wealth of skills, knowledge and experiences.

Again, there is a great deal of government and NHS support

None of that came from just me, I was mostly just ticking off the points made by the NHS Digital RAP Community of practice on How to Publish your Code in the Open.

The Government’s Digital Service Standard (and therefore the NHS Digital Service Standard’s) 12th principle states that all publicly funded code should be open, reusable and available under appropriate licences.

Addressing security concerns

You can choose what files get included in your GitHub repository. We choose to share the code and documentation but none of the data or other sensitive info like API keys.

Be clear what is actually sensitive or not

I’m just going to be copying here because it’s really well put by the GOV.UK Service Manual:

There are very few examples of code that must not be published in the open.

A helpful quick list is provided by the government cyber security guidance:

Closed code

You should keep some data and code closed, including:

  • keys and credentials
  • algorithms used to detect fraud
  • unreleased policy

Open code

You should open all other code. This includes:

  • configuration code
  • database schema
  • security-enforcing code

Safe sharing specifics

Here’s a couple of details if you’re wondering how managing files safely can work.

Ignoring files in Git

You include a .gitignore file in your project that specifies intentionally untracked files and folders to ignore, so that you can include files in your project on your computer, but they don’t get put on the remote repository. It’s under your control, so yes you have responsibility here. As a simple example, include the following line in a .gitignore to ignore all .csv files, using the * wildcard:

*.csv

Environment variables

A key piece of sensitive info are things like access keys and credentials. Unfortunately, sometimes we do have to write them down to make use of them. It’s important to make these don’t get shared, and one approach is to store them in environment variables.

Create the necessary variables in an environment file, then write your main script that includes accessing and using those variables1. Make sure the environment file is ignored by git, but do share the script. Document how to set up the environment variable, so that others can execute the code with their own keys/credentials.


  1. The R-curious should look into .Renviron files and Sys.getenv().↩︎