Archive for the ‘software’ Category


Advanced Methodologies for Bayesian Networks

August 22, 2017

The 3rd Workshop on Advanced Methodologies for Bayesian Networks was run in Kyoto September 20-22, 2017. The workshop was well organised, and the talks were great. Really good invited talks by great speakers!

I’ll be talking about our (with François Petitjean, Nayyar Zaidi and Geoff Webb) recent work with Bayesian Network Classifiers:

Backoff methods for estimating parameters of a Bayesian network

Various authors have highlighted inadequacies of BDeu type scores and this problem is shared in parameter estimation. Basically, Laplace estimates work poorly, at least because setting the prior concentration is challenging. In 1997, Freidman et al suggested a simple backoff approach for Bayesian network classifiers (BNCs). Backoff methods dominate in in n-gram language models, with modified Kneser-Ney smoothing, being the best known, and a Bayesian variant exists in the form of Pitman-Yor process language models from Teh in 2006. In this talk we will present some results on using backoff methods for Bayes network classifiers and Bayesian networks generally. For BNCs at least, the improvements are dramatic and alleviate some of the issues of choosing too dense a network.

Slides are at the AMBN site, here.  Note I spent a bit of time embellishing my slides with some fabulous historical Japanese artwork!

Software for the system is built on the amazing Chordalysis system of François Petitjean, and the code is available as HierarchicalDirichletProcessEstimation.  Boy, Nayyar and François really can do good empirical work!


Is C that bad of a programming language?

May 2, 2016

The following quote about C comes from a Quora answer to the above question:

I don’t think C gets enough credit. Sure, C doesn’t love you. C isn’t about love–C is about thrills. C hangs around in the bad part of town. C knows all the gang signs. C has a motorcycle, and wears the leathers everywhere, and never wears a helmet, because that would mess up C’s punked-out hair. C likes to give cops the finger and grin and speed away. Mention that you’d like something, and C will pretend to ignore you; the next day, C will bring you one, no questions asked, and toss it to you with a you-know-you-want-me smirk that makes your heart race. Where did C get it? “It fell off a truck,” C says, putting away the boltcutters. You start to feel like C doesn’t know the meaning of “private” or “protected”: what C wants, C takes. This excites you. C knows how to get you anything but safety. C will give you anything but commitment. In the end, you’ll leave C, not because you want something better, but because you can’t handle the intensity. C says “I’m gonna live fast, die young, and leave a good-looking corpse,” but you know that C can never die, not so long as C is still the fastest thing on the road.

I love it.  Still do most of my programming in C using a mix of vi, emacs, gdb, valgrind and all that good old stuff, resorting to Python/Perl scripts sometimes for automation. Know I should be using a proper development UI and loading up my code with bulky libraries like Boost, and using complex install systems like Cmake and Autoconf, hell, why not even Imake (done all these in the past).  I should also be using great inventions like multiple inheritance, operator overloading and recursive templates, but I find C’s simple approach to memory handling and functions just a lot safer.

Most of my students use Java though.  Automatic garbage collection and the UI seem to be what they like, as well as the loads of good code to work on out there.


Visualising a topic model

March 25, 2016

Finally decided to write a proper visualiser for topic models.   I used the WordCloud Python tool from AMueller[GitHub].   Modified it because the input I needed to use words with precomputed scores, rather than text input.  Moreover, I wanted two dimensions for words displayed, size (word frequency in topic) and lightness (degree to which the word is characterised by the topic, measured as frequency over document frequency).  I also scale the final tag cloud depending on the size of the topic in the corpus.  The correlation between topics is computed from the document-topic proportions.  All these then go into GraphViz, where nodes are displayed as images and a lot of careful weighting and organising of the number of topic correlations to display, edge weights, etc.

Below are the results on  ABC news articles from their website 2003-2012 collected by Dr. Jinjing Li of NATSEM in Canberra.   These images are about 4000 by 4000 pixels.  YOU will not be able to view it unless you:

  • get on a big screen,
  • click on the image to enter image view mode,
  • then scroll down to bottom right, click on “View full size” to bring it up,
  • and then zoom around to view.

To produce the banking one, I do the following commands with hca:

#  generate the topic model into result set B1
hca -Ang -v -K50 -C1000 -q2 bank B1
#  compute the diagnostics
hca -v -v -V -V -r0 -C0 bank B1
#  generate the image --dot "-Kfdp" --lang png  B1 BN1


Github activity

November 24, 2015

So most of this year I spent doing the Introduction to Data Science (introductory unit at Monash) and getting the Grad. Dip. of Data Science and the Master of Data Science up and running (some background here).

As a result, you can see the disastrous impact it has had on my Github activity, which is a measure of my coding productivity!


Wray’s activity on Github for 2015


Experimental results for non-parametric LDA

November 12, 2014

Swapnil Mishra and I have been testing different software for HDP-LDA, the non-parametric version of LDA first published by Teh, Jordan, Beal and Blei in JASA 2006.  Since then ever more complex and theoretical approaches have been published for this, and its a common topic in recent NIPS conferences.  We’d noticed that the LDA implementation in Mallet has an asymmetric-symmetric version which is a truncated form of HDP-LDA, and David Mimno says this has been around since 2007, though Wallach, Mimno and McCallum published their results with it in NIPS 2009.  Another fast implementation is by Sato, Kurihara, and Nakagawa in KDD 2011,  The original version some people test against is Yee Whye Teh’s implementation from 2004, Nonparametric Bayesian Mixture Models – release 2.1.  This is impressive because it does “slow” Gibbs sampling and still works OK!

We’ve had real trouble comparing different published work because everyone has different ways of measuring perplexity, their test sets are hard to reconstruct, and sometimes their code works really badly and we’ve been unable to get realistic looking results.

Our KDD paper did some comparisons.  We’re re-running things more carefully now.  To compare against Sato et al.‘s results he kindly sent us his original data sets.   Their experimental work was thorough, and precisely written up so its a good one to compare with.  We’ve then re-run Mallet with their ASYM-SYM option to compare and our own two versions of HDP-LDA and our fully non-parametric version NP-LDA (puts a Pitman-Yor process on the word distributions and a Dirichlet process on the topic distributions).  Results in the plot below, lower perplexity is better.  Not sure about Teh’s original 2004 code but our experience from other runs would be it doesn’t match these others.

pcvb0Most of these are with 300 topics.  We’re impressed with how well the others performed.  Mallet is a lot faster because they use some very clever sampling designed for LDA.  Ours is the next fastest because our samplers are much better and it runs moderately well in 8 cores (cheating really).  More details on the comparison in a forth-coming paper.


New release of HCA

September 10, 2014

Version 0.61 just posted.  I’ve been adding some integration to make the diagnostics more useful.  Plus a few corrections to sampling.  Really need to clean up the code though.   uuuggghhh!


So Mallet’s asymmetic-symmetric LDA approximates HDP-LDA

June 10, 2014

So Swapnil and I we were doing a stack of different comparisons with others’ software, and it dawned on me, the asymmetic-symmetric variant in Mallet’s LDA approximates HDP-LDA, first described in the hierarchical Dirichlet process article, the much cited non-parametric version of LDA.  In fact since 2008 its probably been the best implementation and no-one knew, and its blindingly fast too.  Though we got better performance, see our KDD paper.