The Bit-Bucket Challenge

The ALS Ice Bucket Challenge is sweeping the internet, and people are busy coming up with crafty and audacious ways to dump icy water on their heads. I’d like to propose a lesser challenge, one that would help people who work with data everywhere. I call it the bit-bucket challenge. In the words of Godard:

it’s not important where you steal from, it’s where you bring it to

So, here’s my proposal:

ML researchers of the world: please, please, please, release your code under a permissive (BSD,MIT, maybe L-GPL) license.

No need to make a video, no need to agonize over the fine line between slackto-narcisissism, and do-goodery. None of that. This is *much* easier. Here’s how to do it:

  1. Sign up on GitHub, or bit-bucket, or source-forge, or make your own webpage. Just find somewhere to host your project.
  2. Take the readme.txt you stuck in the main working directory of your project, and annotate that in markdown. It should tell people how to build your code, and make note of any obvious stumbling points in the code. Take this time to write one, if you haven’t already done so.
  3. Upload all of that onto the internet (see step 1).
  4. Make a blog post or a tweet briefly describing what you’ve just posted.
  5. Tag three colleagues in your post, challenging them to release a project that they’ve published as a paper.

That’s it.

Is your code ugly? Full of hacks? Hard coded values? Bubble sort? Do not let this deter you; if your code is the least bit functional, someone who has read your paper and would like to try your algorithm will take that mess and improve it. They will patch it, refactor it or make other radical changes to suit their purpose. Chances are good that they’ll share these improvements with you, gratis! Heck, they might even write new tests for your code (you have some tests, right?). But they can’t do that unless you give them the chance.

Why is this a worthwhile challenge? Well, re-implementing a scientific paper is hard. Also, software is now firmly entrenched in the scientific process. If you’re doing computational method development and just writing about your results, I don’t think you’re doing it properly.

“if it’s not open and verifiable by others, it’s not science, or engineering, or whatever it is you call what we do.” (V. Stodden, The scientific method in practice)

(h/t to Gael Varoquaux)

Worried that no one will care about your code? You’re in good company. There are plenty of projects out there not used by anyone other than the author, or algorithms of marginal value. Believe me, I’ve written plenty. But on the off-chance that someone *does* find your code useful, you’ve just saved this person many, many hours of work in trying to re-implement your algorithm from scratch.

So please, dear friends, let’s all share some code. Just do it. Do it for the betterment of us all.

Advertisements