# Category Archives: Talks and Conferences

As pertaining to talks and conferences I attend

# NIPS2014 de-brief: the Friday workshops

Caveat lector: I began composing this back in January, then set it aside to work on thesis stuff. It has been overlooked ever since. NIPS is getting pretty far in the rear-view, but the workshops this year were excellent, and deserve some mention. So, though it’s under-prepared, and looking back I see I didn’t take nearly as many notes as I should have, I’m going to force this post out now. Please don’t interpret the sparsity of my notes as lack of interest.

The last two days of NIPS2014 were all about the workshops. On Friday, I attended the inaugural offering of Machine Learning for Clinical Data Analysis, as well as the workshop on Deep Learning and Representation Learning.

Machine Learning for Clinical Data Analysis

I wish I had taken better notes from the morning session of ml4chg. The invited talk by Gilles Clermont was structured around trying to find the right questions to ask with respect to using machine learning in clinical settings. The ICU is expensive, and technologically intensive. Mortality is high, and risk is high. Doctors want to be convinced beyond just figures showing AUPR, AUROC are good. They want to be convinced that we can use fancy computer models to reduce errors, and manage data. I left thinking that there remain a lot of questions to be answered (and relationships between clinicians and researchers to be built) before any rubber hits the road. The Q&A session that finished off the morning underscored this last point.

Deep Learning

I arrived at the first (!) poster session. Here were a couple that caught my eye.

Invited talk by Rich Schwartz (BBN): Fast and Robust Shallow Neural Networks Joint Models for Machine Translation

• shallowest ngram translation model. Simpler was better, but no phrases, no previous translations that were incorporated.
• Yoshua had loads of ideas to improve what was presented.

Invited talk by Vlad Mnih (Google DeepMind): Deep Reinforcement Learning and REINFORCEd Deep Learning:

• RL and deep learning should be better friends.
• big problems: noisy, delayed signals. Nns don’t necessarily work well in non stationary environments, which is the case in RL
• Q learning updates: smoothed action reward scheme. Froze params of network that generated the target actions, periodically refreshed this.
• another nice thing about RL: add non-differentiable components for NNs using reinforce.
• attention model reduces the amount of computation for convnets, fixes the amount of conv

Invited talk by Surya Ganguli (Stanford): From statistical physics to deep learning: on the beneficial role of dynamic criticality, random landscapes, and the reversal of time.

• good stuff from statistical physics that can help deep learning (published work on his website)
• theory of deep learning (initializing weight dynamics theory), says number of epochs does not need to increase with depth, if chosen properly.
• questions from: Geoff Hinton, Then Yann Lecun, then Yoshua Bengio. Got a big laugh from the audience, but was in itself instructive; the three wise men all believe that physics has something to teach us about the mechanics of error surfaces in deep learning.

That’s it. There were a few photos of posters that I wanted to recall the details later that I won’t post, since while I asked for permission to take the photos, I didn’t think to ask permission to share them. The Saturday workshops were all about MLCB for me, to which I’ll devote a separate post.

# NIPS2014 de-brief: the main conference

NIPS 2014 has come and gone. My brain is full of ideas, I’ve met a host of new people, and my body feels shattered by the 7:30 to 12:00 schedule. This was my second time attending NIPS, and this meeting has strengthened my opinion that the most valuable part of the conference are the posters. The poster sessions are the first-class content. They’re interactive, all the papers are online, and there’s no scheduling problems. Just go around, find work that suits you, and meet the people responsible. For such a large single track conference, they’re indispensable. While you can read the papers online from anywhere, it is nowhere near so rewarding as speaking to the authors directly. If you ask the right question, and the author isn’t too wiped out by the previous four hours of Q&A, you might get insights that don’t make it into the paper, which can be the beginning of a great new idea. It’s fast mixing of chains.

Perhaps I will begin to feel differently about these exhausting sessions as I attend more meetings, and will become less inclined to spend hours among the halls. But for now, the opportunity to connect with such smart, committed people and to tell them personally how much I appreciate their hard work is my favourite part of NIPS. I hope that never changes.

I’ll avoid commenting on any of the talks since, invited speakers apart, each talk is derived from a poster. There are over 100 on display each session, and four sessions in total. I’ll report my favourite poster from each session. You can find all the posters with links to the papers here.

Monday’s poster: Improved Multimodal Deep Learning with Variation of Information [Mon59]

Deep learning has been successfully applied to multimodal representation learning problems, with a common strategy to learning joint representations that are shared across multiple modalities on top of layers of modality-specific networks. Nonetheless, there still remains a question how to learn a good association between data modalities; in particular, a good generative model of multimodal data should be able to reason about missing data modality given the rest of data modalities. In this paper, we propose a novel multimodal representation learning framework that explicitly aims this goal. Rather than learning with maximum likelihood, we train the model to minimize the variation of information. We provide a theoretical insight why the proposed learning objective is sufficient to estimate the data-generating joint distribution of multimodal data. We apply our method to restricted Boltzmann machines and introduce learning methods based on contrastive divergence and multi-prediction training. In addition, we extend to deep networks with recurrent encoding structure to finetune the whole network. In experiments, we demonstrate the state-of-the-art visual recognition performance on MIR-Flickr database and PASCAL VOC 2007 database with and without text features.

Why it’s cool: Multi-modal data is commonplace in computational biology, so I’m all for models that incorporate it. What made this paper stand out for me was that they did not try to learn the model parameters by maximizing the likelihood of the data.

Tuesday’s poster: Nonparametric Bayesian inference on multivariate exponential families [Tue25]

We develop a model by choosing the maximum entropy distribution from the set of models satisfying certain smoothness and independence criteria; we show that inference on this model generalizes local kernel estimation to the context of Bayesian inference on stochastic processes. Our model enables Bayesian inference in contexts when standard techniques like Gaussian process inference are too expensive to apply. Exact inference on our model is possible for any likelihood function from the exponential family. Inference is then highly efficient, requiring only O(log N) time and O(N) space at run time. We demonstrate our algorithm on several problems and show quantifiable improvement in both speed and performance relative to models based on the Gaussian process.

Why it’s cool: They promise to deliver a really big reward: exact inference for *any* likelihood function in the exponential family. Kernel density estimation for non-parametric Bayesian models? Far out.

Wednesday’s poster: Identifying and attacking the saddle point problem in high-dimensional non-convex optimization [Web39]

A central challenge to many fields of science and engineering involves minimizing non-convex error functions over continuous, high dimensional spaces. Gradient descent or quasi-Newton methods are almost ubiquitously used to perform such minimizations, and it is often thought that a main source of difficulty for these local methods to find the global minimum is the proliferation of local minima with much higher error than the global minimum. Here we argue, based on results from statistical physics, random matrix theory, neural network theory, and empirical evidence, that a deeper and more profound difficulty originates from the proliferation of saddle points, not local minima, especially in high dimensional problems of practical interest. Such saddle points are surrounded by high error plateaus that can dramatically slow down learning, and give the illusory impression of the existence of a local minimum. Motivated by these arguments, we propose a new approach to second-order optimization, the saddle-free Newton method, that can rapidly escape high dimensional saddle points, unlike gradient descent and quasi-Newton methods. We apply this algorithm to deep or recurrent neural network training, and provide numerical evidence for its superior optimization performance.

Why it’s cool: Great work by Yann Dauphin and Razvan Pascanu, in conjunction with Surya Ganguli. They used arguments from statistical physics, and some neat experiments to demonstrate that what people originally thought to be bad local minima in the search for parameters in deep networks are actually saddle points which are hard to escape. They propose a modified quasi-Newton method which works well on large auto encoder methods . Even better, their code will soon be released in Theano. Maybe my favourite poster.

Thursday’s poster: Distributed Variational Inference in Sparse Gaussian Process Regression and Latent Variable Models [Thu63]

Gaussian processes (GPs) are a powerful tool for probabilistic inference over functions. They have been applied to both regression and non-linear dimensionality reduction, and offer desirable properties such as uncertainty estimates, robustness to over-fitting, and principled ways for tuning hyper-parameters. However the scalability of these models to big datasets remains an active topic of research. We introduce a novel re-parametrisation of variational inference for sparse GP regression and latent variable models that allows for an efficient distributed algorithm. This is done by exploiting the decoupling of the data given the inducing points to re-formulate the evidence lower bound in a Map-Reduce setting. We show that the inference scales well with data and computational resources, while preserving a balanced distribution of the load among the nodes. We further demonstrate the utility in scaling Gaussian processes to big data. We show that GP performance improves with increasing amounts of data in regression (on flight data with 2 million records) and latent variable modelling (on MNIST). The results show that GPs perform better than many common models often used for big data.

Why it’s cool: Distrubuted computation of GP and GPLVMs. Their key insight is that, conditioned on inducing points, the observations decouple, so p(F_i | X, u) and p(Y_i | F_i) can be evaluated in parallel for every i.

That’s all for now. If you enjoy Montreal, and are thinking of attending NIPS next year, then rejoice; it will be located in Montreal again. The Palais des Congrès was a bright, spacious, easily accessible venue. I can’t wait to return.

# Forgotten drafts: Headed to NIPS2013

[ed: I wrote this while stuck in the Denver airport after a missed connection to Reno (final destination: NIPS), and summarily forgot about it. I’m beginning to digest some NIPS papers, so it feels right to post this now.]

I write this while stuck in the Denver airport after a missed connection. I’m very excited to attend NIPS for the first time, though a little nervous. With so much material presented each day, for five days straight, I’m worried that I’m going to miss something important. Already I’m missing the first few hours of the first poster session through this delay

In the meantime, here are the posters I’m going to focus on for the Thursday night session. The great thing about NIPS is that all papers are up, and easily imported into Mendeley.

# A belated MLSB 2013 retrospective

MLSB is a special interest group meeting that is co-located with ISMB, the Intelligent Systems for Molecular Biology conference. MLSB is devoted to Machine Learning research in the area of Systems Biology. It brings together machine learning folks, statisticians, geneticists, and biologists; it was a great place to strike up discussion about how to do better analysis on biological data. The full program is available online, so here I’m going to write up a few highlights for each day.

Friday

Markus Heinonen gave a solid talk about time dependent GP regression and significance testing for sparse time series. His group was examining the behaviour of cells under irradiation: how does this treatment affect both diseased and healthy cells with respect to gene regulation, cell processes, and the development of radiation disease. They pose this as trying to determine which genes / proteins are differentially expressed in irradiated cells over a time course. He applied GP regression to time series, using significance tests for sparse time series to determine significant expression (or changes). A nice piece of work, but they didn’t provide a citation since it was probably not yet accepted. But I found a link via google scholar.

Lani Wu gave a good invited talk about two recent stories from their lab involving neutrophil locomotion: one about perturbation analysis, and the other about cellular heterogeneity. There wasn’t anything really new in their methods, but what made an impression was the thoroughness of their analysis. Their lab has a great record of turning out quality work. As a fellow high throughput image biologist, who knows all the pain involved when trying to extract meaning from images (which are usually generated in a really noisy way), I was impressed 🙂

Quaid Morris gave a concise and informative talk, presenting work by Shankar that extended Sara Mostafavi‘s work on 3Prop. It addresses function prediction where the input is a set of heterogeneous networks, as well as a group of genes, and the desired outputs are labels for the set of query genes. Their method is called LMGraph, and it incorporates feature information into the function prediction algorithm. They extract discriminative features from all available networks, and combine them via a simple weighting scheme. This has nice properties: it’s highly scalable, using efficient linear classifiers, and it combines both network with feature data to learn interpretable models.

Saturday

Johannes Stephan gave a cool talk about Mixed Random Forests. They developed a new model intended for GWAS data, to try and detect functional relationship between phenotype and genotype while accounting for confounding factors due to population structure.

The standard approach to addressing population structure is to use a linear mixed model:
$y = x\beta + \mu + \epsilon$
$\mu \sim N(0,\sigma^{2}_{g}K)$ where K is the structured covariance matrix, $\epsilon \sim N(0, \sigma^{2}_{v}I)$
Then do a log ratio test on Beta coefficients.

A more difficult model is one where many SNPs that have interaction effects which are non-linear
$f_s(X) = X_1\beta_1 + X_2\beta_2 + \ldots$

Their idea is to combine linear mixed models with tree based models to infer $f_s(X)$ in the presence of the population effect. The nice thing about growing trees is that the method is fast, but are unstable with regards to small changes in input (highly discontinuous). This instability can be mitigated by subsampling and building many trees (RFs). The take home message is to use RF for feature selection, they use linear mixed models for prediction.

Finally, John Marioni gave a really interesting technical talk about the challenges in single cell RNA sequencing. The big idea is: multiple cells from a population -> single cell RNA-seq -> correlate patterns across cells -> infer cell type.

The big challenges remaining:

• quantify variability, identify highly variable genes
• visualization and clustering
• downstream interpretation

John described several technical aspects of experiments in single cell RNA-seq. Long story short: technical variability is still a problem, e.g count value ranges quickly increase when technical noise is large. One example nice bit of work was their work to estimate kinetic model parameters for stochastic models of gene expression from single cell RNA expression data. This talk again didn’t contain anything so new and shiny, but the quality and thoroughness of the work presented that captured my attention. This was my first exposure to real data from single cell RNA seq experiments, which alone was enough to impress me.

# CSHL Automated Imaging meeting: session 1 recap

I’m at the second Cold Spring Harbour Labs meeting on High-Throughput Imaging and Automated Phenotyping. The first session, which took place tonight, was heavily weighted towards the physics of microscopy and different technological implementations and innovations. But it’s been a long day, so I’ll put off a summary until tomorrow.

Edit: life has intervened and I’ll have to wait until an evening this week, or even to the weekend before reviewing my notes and recapping the proceedings. Lots of interesting talks, a few of which I will summarize with commentary.

# DCS distinguished lecture: Jeanette Wing

Eagerly awaiting Jeanette Wing’s talk on computational thinking.  Despite the dungeon-like lighting here in the BA lecture halls, I’m excited to learn something.

# Quick morphology update

Not much I can publish on here yet, but we’re making real progress on a few different morphological phenotype classification problems.

In other news, a couple of interesting workshops and research days are coming up that I’m excited about. On the 29th, Rob Tibshirani‘s coming to town for the U of T biostatistics research day. While Rob Tibshirani’s keynote will assuredly be worth going to, all the talks that day are promising ones.

Looking further ahead, there’s a summer school in statistical genetics at the University of Washington. One of my statistical role models John Storey is co-leading a short course on statistical approaches to data analysis in gene expression profiling. It’s not cheap and it’s a little out of the way, but I’d really love to try and catch this course.