My last post focused on GPU computing in Python, and especially Theano. After a bit more time working through examples, I know there are many little details which matter. One of those details is linking your blas libraries correctly.
This paper describes Theano as a CPU and GPU math compiler. Theano allows you to express mathematical expressions or models as graphs: input nodes, output nodes, operation nodes, and application nodes, which connect operations to inputs for the production of outputs. These graphs are then parsed by optimizers to try and produce a semantically equivalent graph containing fewer (or faster) operations. Finally, the operations are translated into a lower level language (depending on whether the CPU or GPU is the intended target), and that lower level code is compiled.
All of this happens under the hood, which makes Theano very powerful. If you can write down your model in NumPy, translation into Theano is straight forward, and you get a little (or a large) performance boost. Further down the line there are nice features such as profilers, debug modes, and other tools for correcting mistakes and tuning performance.
However, none of this works very well if you can’t configure Theano to link up with BLAS libraries. Theano’s documentation is quite extensive, but it assumes you are working with an open source version of blas. I’m running on SciNet, which has an extremely fast version of BLAS called MKL, that beats every other version availble on that cluster. MKL is commercial software, so it’s understandable that the Theano docs offer little advice to help you configure your ldflags for MKL. So, if you’d like to use MKL with Theano, here are some tips:
- Use the Intel Link Line Advisor: this form will help you get an idea of what your link line should look like. The form will want you to fill out all sections, some of which may not apply to you, before making a suggestion. My advice is to do subset selection on this line, but preserve the order of the elements.
- Read the relevant docs: there is an extensive section about linking, explaining the various layers involved.
- Less can be more: if your installation of MKL built a single dynamic runtime library
libmkl_rt.so, then just try linking that with
-lmkl_rt. This should make the correct choice for interface and threading which reflect your machine.
For those running on SciNet, here’s the link line that works for me:
-L/scinet/gpc/intel/ics/composer_xe_2011_sp1.9.293/mkl/lib/intel64 -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -lpthread -lm
Don’t forget to load the correct modules on SciNet first!