## Lancelot James’ general theory for IBPs

February 9, 2015

When I visited Hong Kong last December, Lancelot James gave me a tutorial on his generalised theory of Indian Buffet Processes, which includes all sorts of interesting extensions that are starting to appear in the machine learning community.

The theory is written up in his recent Arxiv report, “Poisson Latent Feature Calculus for Generalized Indian Buffet Processes.”  I’m a big fan of Poisson processes, so its good to see the general theory written up, including a thorough posterior analysis.  In my recent tutorial (given at Aalto) I’ve included a simplified version of this with the key marginal posterior for a few well known cases.  Piyush Rai from Duke also has great summary slides as part of Lawrence Carin’s reading group at Duke.

Within this framework, you can now handle variations like the Beta-negative-Binomial Process of Mingyuan Zhou and colleagues presented at AI Stats 2012 (Arxiv version here), for which many different variations can be developed.

I’ve presented the general case as an application of improper priors because the Poisson process theory in this case is mostly just formalising their use for developing “infinite parameter vectors”.  Lancelot uses his Poisson Process Partition Calculus to develop a clean, general theory.  For me it was interesting to see that his results also apply to multivariate and auxiliary parameter vectors:

• an infinite vector of Gamma scales $\vec\beta$ created using a Gamma process can have an auxiliary vector of Gamma shapes $\vec\alpha$ sampled pointwise, so $\alpha_k$ is associated with $\beta_k$
• most of the scales are infinitesimal
• most of the shapes are not infinitesimal
• an infinite set of $K+1$-dimensional multinomial’s can be created where the first $K$ dimensions are generated using a Dirichlet style Poisson rate:
• most of the multinomials will have infinitesimal values in the first $K$ dimensions
• the Beta process corresponds to the case where $K=1$

## Talk given around Europe, Jan 2015

January 30, 2015

This talk, PDF slide here, is titled “Non-parametric Methods for Unsupervised Semantic Modelling” and is really a two-hour talk derived from the HKUST talk in December.  I updated it continuously during January and gave it at Helsinki, IJS, UCL, Cambridge and Oxford. It contains my simplified version of Lancelot James‘ excellent theory of generalised Indian Buffet Processes,“Poisson Latent Feature Calculus for Generalized Indian Buffet Processes,” which is on Arxiv 2014.  My version explains things differently with heuristics. The talk I gave at JSI (Jozef Stefan Institute in Ljublana) on 14th Jan 2015 was recorded.  The group here, with Dunja Mladenić and Marko Grobelnik, are expert in areas like Data Science and Text Mining, but they’re not into Bayesian non-parametrics, so in this version of the talk I mostly avoided the statistical details and talked more about what we did and why.  The talk is up on Video Lectures.

Abstract: This talk will cover some of our recent work in extended topic models to serve as tools in text mining and NLP (and hopefully, later, in IR) when some semantic analysis is required. In some sense our goals are akin to the use of Latent Semantic Analysis. The basic theoretical/algorithmic tool we have for this is non-parametric Bayesian methods for reasoning on hierarchies of probability vectors. The concepts will be introduced but not the statistical detail. Then I’ll present some of our KDD 2014 paper (Experiments with Non-parametric Topic Models) that is currently the best performing topic model by a number of metrics.

## Latent IBP compound Dirichlet Allocation

January 30, 2015

I visited Ralf Herbrich’s new Amazon offices in Berlin on 8th January and chatted there on topic modelling.  A pleasant result for me was meeting both Cédric Archambeau and Jan Gasthaus, both having been active in my area.

With Cedric I discussed Latent IBP compound Dirichlet Allocation (in the long awaited IEEE Trans PAMI 2015 special issue on Bayesian non-parametics, PDF on Cédric’s webpage) model which combines a 3-parameter Indian Buffet Process with Dirichlets.  This was joint work with Balaji Lakshminarayanan and Guillaume Bouchard.

Their work improves on the earlier paper by Williamson, Wang, Heller, and Blei (2010) on the Focused Topic Model which struggled with using the IBP theory.  I’d originally ignored Williamson et al.s’ work because they only tested on toy data sets, despite it being such a great model.  The marginal posterior for the document-topic indicator matrix is given in Archambeau et al.s’ Equation (43) which they attribute to Teh and Görür (NIPS 2009) but its easily derived using Dirichlet marginals and Lancelot James’ general formulas for IBPs.  This is the so-called IBP compound Dirichlet. From there, its easy to derive a collapsed Gibbs sampler mirroring regular LDA.  Theory and sampling hyper-parameters for the 3-parameter IBP I describe in my Helsinki talk and coming tutorial.

This is a great paper with quality empirical work and the best results I’ve seen for non-parametric LDA (other than ours).  Their implementation isn’t performance tuned so their timing figures are not that indicative, but they ran on non-trivial data sets so its good enough.

Note for the curious, I ran our HCA code to duplicate their experimental results on the two larger data sets.  Details in the Helsinki talk.  Basically:

• They compared against prior HDP-LDA implementations so of course beat them substantially.
• Our version of HDP-LDA (without burstiness) works as well as their LIDA algorithm, their better one with IBPs on the topic and the word side, and is substantially faster.
• Our fully non-parametric LDA (DPs for documents, PYPs for words, no burstiness) beat their LIDA substantially.

So while we beat them with our superior collapsed Gibbs sampling, their results are impressive so I’m excited by the possibility of trying their methods.