# Install CellProfilerAnalyst onto a debian / ubuntu workstation

I received an email yesterday from someone asking me for advice on how to install CellProfilerAnalyst (CPA) on a linux workstation. It’s a program that allows you to inspect, manipulate and analyze data from cell images. It accompanies CellProfiler, which is a great tool from the Carpenter lab at the Broad Institute. If you’re doing work that involves fluorescence imaging of cells in some aspect, I recommend taking a look.

I’ve been through this process a few times, and while it’s not a picnic, it’s much easier these days. In the past, installation involved horrendous things like compiling and tuning the ATLAS library (it’s an implementation of blas; takes days to do on a workstation). Today all the packages you need are available from some combination of your package manager and pip. Here are the steps I used to install CPA.

** Note: the original request was for an Amazon EC2 instance of ubuntu. Make sure that you choose to launch an instance with a contemporary offering of Ubuntu to get all the packages you’ll need. I used Debian 7.0 (Wheezy), but something like Ubuntu 14.04 or 12.04 should work. I suspect your sources.list file should have Universe / Multiverse repositories enabled, to get the packages you need. **

1. Launch the instance, and perform `apt-get update; apt-get upgrade` to refresh your pre-installed packages.
2. Install the following packages based on Option 2. The instructions are geared towards installing CellProfiler, but for running CPA you should still install the following: WxPython, Cython, Java development kit, MySQLdb, Numpy, Scipy, ZMQ, Matplotlib. The debian package names for these are: `python-zmq, mysql-server, python-numpy, python-scipy, cython, python-wxgtk2.8, java-package, python-matplotlib`. The ubuntu packages names should be either identical or very similar. You can use `apt-get install` to install all these. There will probably be other package dependencies that apt will install.

3. An aside concerning the Java install: `java-package` is a meta package which will download the latest version of java from Oracle, compile it and create a .deb package (say, java_foo.deb) for your system to install. If apt does not install the package, you may install it with `dpkg -i java_foo.deb`.

It is the only package you should have to install (the java-sun-* listed in the wiki are deprecated, ignore them). It’s easy to manage this way; the install instructions (step 8 of Option 2) are a bit messy, so I’d recommend beginning with installing just java-package, and installing the subsequent .deb produced. Then continue with the remaining steps in installing prerequisite packages for CPA. If it fails with some java related error, try making the suggested changes in configuration files or environment variables. Less is more when you’re adding to \$PATH or \$LD_LIBRARY_PATH, in my opinion.

4. Use git to grab the latest version of CPA: `git clone https://github.com/CellProfiler/CellProfiler-Analyst.git`. If your instance doesn’t have git installed by default, `apt-get install git`. Then cd into the newly created CellProfiler-Analyst directory.
5. Use pip to install the remaining required packages: `pip install python-bioformats; pip install mock; pip install verlib`. Installing python-bioformats should also install a python package called javabridge. If pip is not installed by default, `apt-get install python-pip`.
6. `python CellProfiler-Analyst.py` should start CPA. It may instead begin to issue a bunch of text in the terminal about modules being cythonized. As long as you eventually see the CPA logo in a pop-up screen followed by a request to pick a properties file to open, you’ve installed successfully.

# This is why we need code review.

One of the most fundamental operations in parallel computing is an operation called scan. Scan comes in two varieties, inclusive scan and exclusive scan (depending on whether $y_i$ includes $x_i$ or not.

From wikipedia: In computer science, the prefix sum, scan, or cumulative sum of a sequence of numbers $x_0, x_1, x_2,\; \dots$ is a second sequence of numbers $y_0, y_1, y_2,\; \dots$ , the sums of prefixes (running totals) of the input sequence $y_i = y_{i-1} + x_i$ (note this definition is inclusive).

Simple enough right? I’ve seen two simple variants for computing scan, one by Hillis and Steele, the other by Blelloch. Though it seems like scan is an inherently serial computation due to the dependency on the previous input, it actually decouples very nicely and can be computed in parallel very easily. Hillis and Steele’s exclusive scan algorithm goes something like this:

``` len = input.size; output[threadId] = input[threadId]; for (int i = 1; i = i) { output[threadId] += input[threadId - i] } } ```[ed: wordpress is not rendering this properly, but you can find the pseudo-code description in the link below]

Hillis and Steele’s scan is what is called step efficient: it executes in $O(\log(n))$ steps for inputs of size $n$. But at each step, it performs $O(n)$ operations, and so the overall work complexity is $O(n * \log(n))$. Blelloch’s is more complex, but is more work efficient: it requires only $O(n)$ operations. Here’s my code for Hillis and Steele’s inclusive scan:

```__global__ void hs_prefix_add(int * d_in, unsigned int *d_out) { extern __shared__ unsigned int s_in[]; extern __shared__ unsigned int s_out[]; int tid = threadIdx.x; ```

``` // load input into shared memory. s_in[tid] = static_cast(d_in[tid]); s_out[tid] = s_in[tid]; __syncthreads(); for (int offset = 1; offset < blockDim.x; offset <= offset) { s_out[tid] += s_in[tid - offset]; } __syncthreads(); } ```

``` d_out[tid] = s_out[tid]; }```

I’m using shared memory for the buffers since it’s directly on board the thread blocks, and is thus much faster to access than going back to global memory for every step of the loop. Simple, right? When I first tried to implement scan, I thought I’d see if Nvidia would offer any pointers explaining the finer points. After all, my implementation doesn’t handle cases with arrays that are not a power of 2, and won’t scale to arrays larger than the dimension of a block (which happened to be fine for the problem I was solving, but not o.k in general). So I found this document, talking about prefix sum. Here’s their algorithm for Hillis and Steele’s exclusive scan:

```__global__ void scan(float *g_odata, float *g_idata, int n) { extern __shared__ float temp[]; // allocated on invocation int thid = threadIdx.x; int pout = 0, pin = 1; // load input into shared memory. // This is exclusive scan, so shift right by one and set first elt to 0 temp[pout*n + thid] = (thid > 0) ? g_idata[thid-1] : 0; __syncthreads(); for (int offset = 1; offset = offset) temp[pout*n+thid] += temp[pin*n+thid - offset]; else temp[pout*n+thid] = temp[pin*n+thid]; __syncthreads(); } g_odata[thid] = temp[pout*n+thid1]; // write output }```

When I read this code, I have to ask at least the following:

• I don’t understand why this code uses ‘double buffer indices’ for one large array, rather than two shared memory arrays. It makes for less readable code, and introduces a potential bug if the user places the swap below the if block rather than above.
• The last line which writes the output back to global memory contains a typo (thid1 should be thid)

This could go into a textbook as an example of why we need code review. That it was published with a compilation error tells me that this code was un-tested. Even after fixing it, try running it at home and see what you get (hint: not what you would expect).

Perhaps the mistakes and obfuscation were intentional, to foil google-copy-and-paste homework submissions for future courses. But I think that greater damage is done by having present and future searches point to obfuscated and incorrect code (the same code, error and all, appears in these slides). Parallel computation is hard enough without having to decode stuff like this. We should be teaching people to write legible, easily verifiable code that can later be optimized or generalized.

# Looking for an industry job? Take note

I’m currently on leave to do an internship at a startup company. When people asked me why I decided to pursue an internship, I replied (in jest) that after so many years of grad school, I should try to show that I’m still employable. Today I read an article that suggests this is more true than I realized.

The article by Chand John, a PhD grad in Computer Science from Stanford, underscores the importance of industry experience or exposure when targeting an industry job after graduation. John’s job search (thankfully successful) took one whole year. He went to informational interviews, he vetted his resume with friends in industry, he studiously prepared for each one-on-one interview. Getting interviews? Not a problem, he landed more than 30 interviews before finding a job which interested him and a company willing to take a chance on a PhD grad with no industry experience:

No one could pinpoint anything I was doing wrong. Professors and industry veterans inferred I must be saying something really crazy to destroy myself in 30-plus interviews: There was “no way” a person with my credentials could be denied so many jobs. However, I had said nothing crazy. My interviews had largely gone smoothly. And I did eventually land a job closely related to my Ph.D. But the opportunity didn’t arise until a year after finishing my doctorate. Before that lucky break, my accomplishments and efforts weren’t paying off.

Why?

As a scientist, I had already been gathering data about that question. Each time I was rejected from a job, I asked the companies for reasons. They were often vague, but two patterns emerged: (1) Companies hesitated to hire a Ph.D. with no industry experience (no big surprise) even if they had selected you for an interview and you did well (surprise!). And (2) my Ph.D. background, while impressive, just didn’t fit the profile of a data scientist (whose background is usually in machine learning or statistics), a product manager (Ph.D.’s couldn’t even apply for Google’s Associate Product Manager Program until recently), or a programmer (my experience writing code at a university, even on a product with 47,000 unique downloads, didn’t count as coding “experience”).