I received an email yesterday from someone asking me for advice on how to install CellProfilerAnalyst (CPA) on a linux workstation. It’s a program that allows you to inspect, manipulate and analyze data from cell images. It accompanies CellProfiler, which is a great tool from the Carpenter lab at the Broad Institute. If you’re doing work that involves fluorescence imaging of cells in some aspect, I recommend taking a look.
I’ve been through this process a few times, and while it’s not a picnic, it’s much easier these days. In the past, installation involved horrendous things like compiling and tuning the ATLAS library (it’s an implementation of blas; takes days to do on a workstation). Today all the packages you need are available from some combination of your package manager and pip. Here are the steps I used to install CPA.
** Note: the original request was for an Amazon EC2 instance of ubuntu. Make sure that you choose to launch an instance with a contemporary offering of Ubuntu to get all the packages you’ll need. I used Debian 7.0 (Wheezy), but something like Ubuntu 14.04 or 12.04 should work. I suspect your sources.list file should have Universe / Multiverse repositories enabled, to get the packages you need. **
- Launch the instance, and perform
apt-get update; apt-get upgrade to refresh your pre-installed packages.
- Install the following packages based on Option 2. The instructions are geared towards installing CellProfiler, but for running CPA you should still install the following: WxPython, Cython, Java development kit, MySQLdb, Numpy, Scipy, ZMQ, Matplotlib. The debian package names for these are:
python-zmq, mysql-server, python-numpy, python-scipy, cython, python-wxgtk2.8, java-package, python-matplotlib. The ubuntu packages names should be either identical or very similar. You can use
apt-get install to install all these. There will probably be other package dependencies that apt will install.
An aside concerning the Java install:
java-package is a meta package which will download the latest version of java from Oracle, compile it and create a .deb package (say, java_foo.deb) for your system to install. If apt does not install the package, you may install it with
dpkg -i java_foo.deb.
It is the only package you should have to install (the java-sun-* listed in the wiki are deprecated, ignore them). It’s easy to manage this way; the install instructions (step 8 of Option 2) are a bit messy, so I’d recommend beginning with installing just java-package, and installing the subsequent .deb produced. Then continue with the remaining steps in installing prerequisite packages for CPA. If it fails with some java related error, try making the suggested changes in configuration files or environment variables. Less is more when you’re adding to $PATH or $LD_LIBRARY_PATH, in my opinion.
- Use git to grab the latest version of CPA:
git clone https://github.com/CellProfiler/CellProfiler-Analyst.git. If your instance doesn’t have git installed by default,
apt-get install git. Then cd into the newly created CellProfiler-Analyst directory.
- Use pip to install the remaining required packages:
pip install python-bioformats; pip install mock; pip install verlib. Installing python-bioformats should also install a python package called javabridge. If pip is not installed by default,
apt-get install python-pip.
python CellProfiler-Analyst.py should start CPA. It may instead begin to issue a bunch of text in the terminal about modules being cythonized. As long as you eventually see the CPA logo in a pop-up screen followed by a request to pick a properties file to open, you’ve installed successfully.
My last post focused on GPU computing in Python, and especially Theano. After a bit more time working through examples, I know there are many little details which matter. One of those details is linking your blas libraries correctly.
This paper describes Theano as a CPU and GPU math compiler. Theano allows you to express mathematical expressions or models as graphs: input nodes, output nodes, operation nodes, and application nodes, which connect operations to inputs for the production of outputs. These graphs are then parsed by optimizers to try and produce a semantically equivalent graph containing fewer (or faster) operations. Finally, the operations are translated into a lower level language (depending on whether the CPU or GPU is the intended target), and that lower level code is compiled.
All of this happens under the hood, which makes Theano very powerful. If you can write down your model in NumPy, translation into Theano is straight forward, and you get a little (or a large) performance boost. Further down the line there are nice features such as profilers, debug modes, and other tools for correcting mistakes and tuning performance.
However, none of this works very well if you can’t configure Theano to link up with BLAS libraries. Theano’s documentation is quite extensive, but it assumes you are working with an open source version of blas. I’m running on SciNet, which has an extremely fast version of BLAS called MKL, that beats every other version availble on that cluster. MKL is commercial software, so it’s understandable that the Theano docs offer little advice to help you configure your ldflags for MKL. So, if you’d like to use MKL with Theano, here are some tips:
- Use the Intel Link Line Advisor: this form will help you get an idea of what your link line should look like. The form will want you to fill out all sections, some of which may not apply to you, before making a suggestion. My advice is to do subset selection on this line, but preserve the order of the elements.
- Read the relevant docs: there is an extensive section about linking, explaining the various layers involved.
- Less can be more: if your installation of MKL built a single dynamic runtime library
libmkl_rt.so, then just try linking that with
-lmkl_rt. This should make the correct choice for interface and threading which reflect your machine.
For those running on SciNet, here’s the link line that works for me:
-L/scinet/gpc/intel/ics/composer_xe_2011_sp1.9.293/mkl/lib/intel64 -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -lpthread -lm
Don’t forget to load the correct modules on SciNet first!
For those using Python to develop and apply machine learning algorithms, there are now a plethora of tools at your disposal. There are plenty of toolkits which collect a bunch of classic or standard models (my favourite of which is scikit-learn), but for various reasons these do not scale to larger data sets. With the exception of some use-cases like boosting or cross-validation, for example, few applications of these methods or model-fitting procedures can be parallelized simply. So what to do if you want your code to run faster, but cannot just throw more processors at the problem, and don’t want to reinvent a slightly better wheel?
One possible out is to use a high level GPU compiler. I have recently been looking into projects which combine accelerated computational abilities with NumPy’s un-adorned simplicity of expression. They all make use of GPUs for acceleration, and are of varying degrees of complexity to install and configure. Though these all depend on installing the CUDA-SDK and CUBLAS library in some form, both are easily installable via your linux package management system. For example, in debian wheezy / sid / experimental,
sudo apt-get install nvidia-cuda-dev nvidia-cuda-toolkit libcublas4 will do the trick.
- CUDAmat: this library wraps up various matrix operations in CUDA kernels, which are in turn backed by CUBLAS. Perhaps the simplest to install.
- Theano: this library does quite a lot in addition to pushing more complex calculations to a GPU device (also reliant on CUDA/CUBLAS). It also does symbolic differentiation, and has a pretty thriving user community. Definitely recommended.
- Numba: while I have not yet tried this package, it seems very promising. Travis Oliphant of NumPy fame is at the head of the company that is developing this NumPy-aware python compiler. It uses LLVM to compile decorated Python functions down into really fast byte-code. See Travis speak about it here, and Jake Vanderplas compare it with Cython here. Just recently a CUDA compiler for Numba was announced, which is very promising.
My first experiments have been with Theano. Configuring it to work on the ARC of SciNet has been a bit of a pain, not having root access and all, but so far so good. Initial tests show a speed-up of a factor of about 40 when using the GPU versus the CPU (single threaded) on matrix product tasks. I should know in short order if the auto-encoder based dimensionality reduction model I proposed in December will stack up (see what I did there?) against PCA, ISOMAP and LLE.