A quick and useful note for organizing comp bio projects

I just happened upon William Noble’s guide to organizing computational biology projects. There are many good practices discussed therein, especially if you don’t have much experience with structuring a project yourself. He touches on a few important points like keeping a readme.txt file in every directory to help chart your progress throughout the project, the virtues of version control, separation of code from data and results.

For me, I would like to push as many of the bookkeeping aspects of a project (status of the project, directory readme.txt files, lab notebooks) onto version control. Why? Because it’s more easily accessible, stored in one place, and can be used to generate any of the reports or views that a readme.txt or lab notebook represents for the given project. Running an experiment? That’s a private github gist with a link to your lab notebook entry (which can be a private blog hosted by github). This way, you use one tool to synchronize different related parts of your project, so your readme.txt files aren’t out of date (or out of sync with your code/results/documentation). It’s easy to share with collaborators, and best of all you don’t have to maintain the infrastructure.

FYI: if you use git, and do not want .pyc / .o / .a / … files cluttering your repo, set up an ignore file.