Archive for the ‘theory’ Category


Issues about Dirichlet Process inconsistencies

January 20, 2014

I saw the NIPS 2013 paper by Miller and Harrison on “A simple example of Dirichlet process mixture inconsistency for the number of components,” and I had some issues with it.  A Dirichlet Process is a prior that says there is an infinite number of clusters in the mixture.  But at any one time, after seeing N data and a concentration parameter of θ, it expects to see about λ = θ log(N/θ) clusters plus or minus 3*sqrt(λ) or so … for N>θ>>0.  This approximation gives the famous “grows with log(N)” formula some tutorials give for DPs.   Anyway, so I cannot really see why this makes the DP inconsistent if the true model has a finite number of clusters, which is not in the prior!  It just means the DP is true to itself.  So this apparent inconsistency does not affect a Bayesian.

This seems to be a basic confusion with the Dirichlet Process generally.  Some people think it can be used to estimate the “right number of clusters”.  Well, be careful.  I can change θ and get it to estimate a large or a small number of clusters!  We do the same with the number of topics in a non-parametric topic model.


Wikipedia on probability theory

July 11, 2013

I created a PDF map of probability theory, the stuff that matters for Bayesian analysis, using the concepts available from the Wikipedia, with clickable links to the actual pages. Open and view at 600% to read the writing!  It should be on an A2 page.

In some cases, critical stuff is missing so I’ve just left an open box with a title there.  Its broken into areas and you follow the arcs backwards to get prerequisites.  Generally the Wikipedia material is pretty good!  Coverage of some areas is poor though (graphical models with plates, Pitman-Yor processes, etc.)

So you can get a pretty good education from the Wikipedia.  All that’s missing is exercises.  Note Wikipedia also has the concept of books, so for an undergraduate statistics coverage you can see the Wikipedia Book on Statistics, but I like to see a map of relationships.

The original is in DOT and I generate a big PDF file with clickable Wikipedia icons.    Originally I output it to SVG but the viewers would not scale up enough to the size of the page, so used PDF instead.

Any suggestions?