Latent IBP compound Dirichlet Allocation

January 30, 2015

I visited Ralf Herbrich’s new Amazon offices in Berlin on 8th January and chatted there on topic modelling. A pleasant result for me was meeting both Cédric Archambeau and Jan Gasthaus, both having been active in my area.

With Cedric I discussed Latent IBP compound Dirichlet Allocation (in the long awaited IEEE Trans PAMI 2015 special issue on Bayesian non-parametics, PDF on Cédric’s webpage) model which combines a 3-parameter Indian Buffet Process with Dirichlets. This was joint work with Balaji Lakshminarayanan and Guillaume Bouchard.

Their work improves on the earlier paper by Williamson, Wang, Heller, and Blei (2010) on the Focused Topic Model which struggled with using the IBP theory. I’d originally ignored Williamson et al.s’ work because they only tested on toy data sets, despite it being such a great model. The marginal posterior for the document-topic indicator matrix is given in Archambeau et al.s’ Equation (43) which they attribute to Teh and Görür (NIPS 2009) but its easily derived using Dirichlet marginals and Lancelot James’ general formulas for IBPs. This is the so-called IBP compound Dirichlet. From there, its easy to derive a collapsed Gibbs sampler mirroring regular LDA. Theory and sampling hyper-parameters for the 3-parameter IBP I describe in my Helsinki talk and coming tutorial.

This is a great paper with quality empirical work and the best results I’ve seen for non-parametric LDA (other than ours). Their implementation isn’t performance tuned so their timing figures are not that indicative, but they ran on non-trivial data sets so its good enough.

Note for the curious, I ran our HCA code to duplicate their experimental results on the two larger data sets. Details in the Helsinki talk. Basically:

They compared against prior HDP-LDA implementations so of course beat them substantially.
Our version of HDP-LDA (without burstiness) works as well as their LIDA algorithm, their better one with IBPs on the topic and the word side, and is substantially faster.
Our fully non-parametric LDA (DPs for documents, PYPs for words, no burstiness) beat their LIDA substantially.

So while we beat them with our superior collapsed Gibbs sampling, their results are impressive so I’m excited by the possibility of trying their methods.

Posted in theory | Tagged HCA, IBP |

One comment

Hi I read the LIDA paper and I don’t think I understood it that well, but seeing your comment:

” their better one with IBPs on the topic and the word side, and is substantially faster”

Can you apply this to the standard factor analysis too, where X = AF + e such that both A and F have IBP priors on them?
by rmehta1987 June 10, 2017 at 4:46 pm

Reply

Bayesian Models