Experimental results for non-parametric LDA

November 12, 2014

Swapnil Mishra and I have been testing different software for HDP-LDA, the non-parametric version of LDA first published by Teh, Jordan, Beal and Blei in JASA 2006. Since then ever more complex and theoretical approaches have been published for this, and its a common topic in recent NIPS conferences. We’d noticed that the LDA implementation in Mallet has an asymmetric-symmetric version which is a truncated form of HDP-LDA, and David Mimno says this has been around since 2007, though Wallach, Mimno and McCallum published their results with it in NIPS 2009. Another fast implementation is by Sato, Kurihara, and Nakagawa in KDD 2011, The original version some people test against is Yee Whye Teh’s implementation from 2004, Nonparametric Bayesian Mixture Models – release 2.1. This is impressive because it does “slow” Gibbs sampling and still works OK!

We’ve had real trouble comparing different published work because everyone has different ways of measuring perplexity, their test sets are hard to reconstruct, and sometimes their code works really badly and we’ve been unable to get realistic looking results.

Our KDD paper did some comparisons. We’re re-running things more carefully now. To compare against Sato et al.‘s results he kindly sent us his original data sets. Their experimental work was thorough, and precisely written up so its a good one to compare with. We’ve then re-run Mallet with their ASYM-SYM option to compare and our own two versions of HDP-LDA and our fully non-parametric version NP-LDA (puts a Pitman-Yor process on the word distributions and a Dirichlet process on the topic distributions). Results in the plot below, lower perplexity is better. Not sure about Teh’s original 2004 code but our experience from other runs would be it doesn’t match these others.

Most of these are with 300 topics. We’re impressed with how well the others performed. Mallet is a lot faster because they use some very clever sampling designed for LDA. Ours is the next fastest because our samplers are much better and it runs moderately well in 8 cores (cheating really). More details on the comparison in a forth-coming paper.

Posted in software | Tagged HCA |

One comment

[…] for the curious, I ran our HCA code to duplicate their experimental results on the two larger data sets. Details in the Helsinki […]
by Latent IBP compound Dirichlet Allocation | Topical Issues January 30, 2015 at 3:59 pm

Reply

Bayesian Models