Category Archives: Uncategorized

Install CellProfilerAnalyst onto a debian / ubuntu workstation

I received an email yesterday from someone asking me for advice on how to install CellProfilerAnalyst (CPA) on a linux workstation. It’s a program that allows you to inspect, manipulate and analyze data from cell images. It accompanies CellProfiler, which is a great tool from the Carpenter lab at the Broad Institute. If you’re doing work that involves fluorescence imaging of cells in some aspect, I recommend taking a look.

I’ve been through this process a few times, and while it’s not a picnic, it’s much easier these days. In the past, installation involved horrendous things like compiling and tuning the ATLAS library (it’s an implementation of blas; takes days to do on a workstation). Today all the packages you need are available from some combination of your package manager and pip. Here are the steps I used to install CPA.

** Note: the original request was for an Amazon EC2 instance of ubuntu. Make sure that you choose to launch an instance with a contemporary offering of Ubuntu to get all the packages you’ll need. I used Debian 7.0 (Wheezy), but something like Ubuntu 14.04 or 12.04 should work. I suspect your sources.list file should have Universe / Multiverse repositories enabled, to get the packages you need. **

  1. Launch the instance, and perform apt-get update; apt-get upgrade to refresh your pre-installed packages.
  2. Install the following packages based on Option 2. The instructions are geared towards installing CellProfiler, but for running CPA you should still install the following: WxPython, Cython, Java development kit, MySQLdb, Numpy, Scipy, ZMQ, Matplotlib. The debian package names for these are: python-zmq, mysql-server, python-numpy, python-scipy, cython, python-wxgtk2.8, java-package, python-matplotlib. The ubuntu packages names should be either identical or very similar. You can use apt-get install to install all these. There will probably be other package dependencies that apt will install.

  3. An aside concerning the Java install: java-package is a meta package which will download the latest version of java from Oracle, compile it and create a .deb package (say, java_foo.deb) for your system to install. If apt does not install the package, you may install it with dpkg -i java_foo.deb.

    It is the only package you should have to install (the java-sun-* listed in the wiki are deprecated, ignore them). It’s easy to manage this way; the install instructions (step 8 of Option 2) are a bit messy, so I’d recommend beginning with installing just java-package, and installing the subsequent .deb produced. Then continue with the remaining steps in installing prerequisite packages for CPA. If it fails with some java related error, try making the suggested changes in configuration files or environment variables. Less is more when you’re adding to $PATH or $LD_LIBRARY_PATH, in my opinion.


  4. Use git to grab the latest version of CPA: git clone https://github.com/CellProfiler/CellProfiler-Analyst.git. If your instance doesn’t have git installed by default, apt-get install git. Then cd into the newly created CellProfiler-Analyst directory.
  5. Use pip to install the remaining required packages: pip install python-bioformats; pip install mock; pip install verlib. Installing python-bioformats should also install a python package called javabridge. If pip is not installed by default, apt-get install python-pip.
  6. python CellProfiler-Analyst.py should start CPA. It may instead begin to issue a bunch of text in the terminal about modules being cythonized. As long as you eventually see the CPA logo in a pop-up screen followed by a request to pick a properties file to open, you’ve installed successfully.

Debian wheezy swig2.0-examples package missing configure file

This post is esoteric, but in the off-chance someone else needs it, hopefully google brought you here.

I’m looking into using SWIG to wrap up a c++ app in python. How to do this for more than a small one class or one file code-base is far from clear. So I tried installing the swig2.0-examples debian package for Wheezy. My progress went like this:

  1. cd into /usr/share/doc/swig2.0-examples/python/import
  2. sudo make
  3. It doesn’t work

All sorts of compiler / make complaints about undefined symbols such as @CXX@. What’s up with that? It turns out the Makefile depends on a master makefile in /usr/share/doc/swig2.0-examples/, which is missing. There *is* an autoconf generated Makefile.in, but no configure script to define the flags for the given host system. It turns out that downloading the source package and running the configure script that is included therein will generate the Makefiles needed:

$ wget http://ftp.de.debian.org/debian/pool/main/s/swig2.0/swig2.0_2.0.7.orig.tar.gz
$ gunzip swig2.0_2.0.7.orig.tar.gz && tar xvf swig2.0_2.0.7.orig.tar
$ cd swig-2.0.7 && ./configure
$ sudo cp Makefile /usr/share/doc/swig2.0-examples

Et voila, the build now succeeds:

lee@beehive:/usr/share/doc/swig2.0-examples/python/import$ sudo make
make -f ../../Makefile SWIG='swig' SWIGOPT='' \
LIBS='' TARGET='base' INTERFACE='base.i' python_cpp
make[1]: Entering directory `/usr/share/doc/swig2.0-examples/python/import'
swig -python -c++ base.i
g++ -c -fpic base_wrap.cxx -I/usr/include/python2.7 -I/usr/lib/python2.7/config
g++ -shared base_wrap.o -o _base.so
make[1]: Leaving directory `/usr/share/doc/swig2.0-examples/python/import'
make -f ../../Makefile SWIG='swig' SWIGOPT='' \
LIBS='' TARGET='foo' INTERFACE='foo.i' python_cpp
make[1]: Entering directory `/usr/share/doc/swig2.0-examples/python/import'
swig -python -c++ foo.i
g++ -c -fpic foo_wrap.cxx -I/usr/include/python2.7 -I/usr/lib/python2.7/config
g++ -shared foo_wrap.o -o _foo.so
make[1]: Leaving directory `/usr/share/doc/swig2.0-examples/python/import'
make -f ../../Makefile SWIG='swig' SWIGOPT='' \
LIBS='' TARGET='bar' INTERFACE='bar.i' python_cpp
make[1]: Entering directory `/usr/share/doc/swig2.0-examples/python/import'
swig -python -c++ bar.i
g++ -c -fpic bar_wrap.cxx -I/usr/include/python2.7 -I/usr/lib/python2.7/config
g++ -shared bar_wrap.o -o _bar.so
make[1]: Leaving directory `/usr/share/doc/swig2.0-examples/python/import'
make -f ../../Makefile SWIG='swig' SWIGOPT='' \
LIBS='' TARGET='spam' INTERFACE='spam.i' python_cpp
make[1]: Entering directory `/usr/share/doc/swig2.0-examples/python/import'
swig -python -c++ spam.i
g++ -c -fpic spam_wrap.cxx -I/usr/include/python2.7 -I/usr/lib/python2.7/config
g++ -shared spam_wrap.o -o _spam.so
make[1]: Leaving directory `/usr/share/doc/swig2.0-examples/python/import'

quick reminder post: pip install prefix

Say you want to install a python package using pip, but you don’t have sudo privileges or root, so you can’t use your site-packages directory, and instead you want to specify some directory below ~. For some reason, the man page for pip does not have the syntax to do this, so I’ll leave it here:

pip install --install-option="--prefix=/path/to/prefix/dir" foo

This works nicely on SciNet, in conjunction with their advice about setting up a local python package directory.

A quick thought on writing about failure

I recently finished reading a great resource for CS PhD students that I’d like to advertise: Philip Guo’s PhD Grind.

It’s a memoir that recounts some of the author’s experiences accrued during his PhD. There are plenty of passages which speak to PhD students (Guo posts a fraction of the torrent of feedback he gets on his site). I certainly empathized with some of them. I think he’s shared some valuable lessons within about academic problems that are both common and hard to work through: finding research topics, finding inspiration, how to strike up or nurture collaborations. If you’re spinning your wheels in the lab, or you doubt that you’re smart enough for grad school, or you want to take ownership of your PhD but aren’t sure how to begin, read this book. Each chapter is a gem.

These are lessons rather than recipes. Your circumstances will be different. But he’s been through struggles shared by many CS grad students, and come out wiser. Reading about them should bring you some comfort, and maybe, if you’re lucky, some inspiration.

Oh, I had a point I wanted to make about dealing with professional (i.e academic) rejection. In The PhD Grind, Guo experiences plenty of paper rejections, frustrations, and ends up questioning whether or not he’s got the chops to continue in academia. He doubts that he’s good enough, that he’s smart enough, that he’s accomplished enough to earn a professorship. As the book ends he decides to ‘retire’ from academic life, for a number of reasons; a decision with which he’s perfectly at peace. Today he’s a professor at University of Rochester. What changed?

It made me recall a similar personal story of the philosopher Mark Kingwell I encountered in his book called the Pursuit of Happiness. He wrote it many years ago while working as a sessional lecturer at University of Toronto. I can still recall him recount with precision how he felt he did not belong in the philosophy department at U of T. He finished the book with, it seemed, every intention of never working there again. Today he’s a tenured professor at University of Toronto. What changed?

Did each arrive at some kind of Michael Corleone moment? Probably not. But I do think there’s a common lesson illustrated here. Maybe these two examples show how writing about failure helps to channel personal feelings of self-doubt into something more constructive. Maybe writing candidly about failure [1] has therapeutic benefits. Maybe it helps release you from the pressure of not achieving your goal, and allows you to refocus on what you should do next.

[1] This probably only works when you publish the work in some form.

R by Radford Neal

Radford Neal recently announced the release of pretty quick R (pqR), a fork of R from 2.15.0. It’s got lots of nice performance boosts, the most valuable of which for me is transparent multiprocessing capability.

I won’t repeat any more of the the post announcing it, check it out for yourself. I hate to be a back-seat developer, but why not use the name fastR?

Link

Large Scale Machine Learning course at NYU

John Langford and Yann Lecun have teamed up to teach an advanced course in machine learning.  The focus is on methods that scale to large data sets.  Though it will not be offered as a MOOC, they do plan to record the lectures (I imagine they will appear on videolectures.net).  Well worth checking out.

Hadley Wickham’s Rcpp tutorial

If you are conversant in R, you probably know about Hadley Wickham.  His ggplot package for visualization is fantastic.  His plyr package covering the split-apply-combine design pattern is even better (and plays so nicely with parallelized backends for the foreach package like snow or multicore).

Now, via the great blog SimplyStatistics, I found that Hadley has written a lauded tutorial about how to use the Rcpp package.  From the tutorial:

Sometimes R code just isn’t fast enough – you’ve used profiling to find the bottleneck, but there’s simply no way to make the code any faster. This [tutorial] is the answer to that problem.

Obviously, Rcpp is no speed panacea.  If you’ve already done hard work to vectorize your code, then just moving part of it to Cpp might not yield a great improvement in speed, and might be more work than it is worth.  But there are some good use cases covered:

  1. Say you have a loop that contains dependency on previous iterations.  It cannot be trivially parallelized, but should not be too much work to translate into Cpp. 

  2. Say you have an algorithm that is based on some more complicated data structures (e.g ball trees for k-nearest neighbours) that are not convenient to use in R.  Rcpp would make using a data structure like this a breeze.

It’s all in there, go have a look.