Archive for the ‘talks’ Category


Invited talk at ACML in Beijing

October 11, 2018

I’m giving an invited talk at ACML in Beijing the coming November:  see Invited Speakers at the ACML website.

I’m going to talk about the state of Machine Learning, contrasting the old with the new, and discuss where we may head next.  Moreover, I’ll give some warnings about some problems we are currently facing.  PDF slides for the talk are here.

Something Old, Something New, Something Borrowed, Something Blue

Something Old: In this talk I will first describe some of our recent work with hierarchical probabilistic models that are not deep neural networks. Nevertheless, these are currently among the state of the art in classification and in topic modelling: k-dependence Bayesian networks and hierarchical topic models, respectively, and both are deep models in a different sense. These represent some of the leading edge machine learning technology prior to the advent of deep neural networks. Something New: On deep neural networks, I will describe as a point of comparison some of the state of the art applications I am familiar with: multi-task learning, document classification, and learning to learn. These build on the RNNs widely used in semi-structured learning. The old and the new are remarkably different. So what are the new capabilities deep neural networks have yielded? Do we even need the old technology? What can we do next? Something Borrowed: to complete the story, I’ll introduce some efforts to combine the two approaches, borrowing from earlier work in statistics.


ECML-PKDD talk on Bayesian network classifiers

September 14, 2018

On the 14th September 2018 I presented the following paper at ECML-PKDD in Dublin.  The slides for the talk are here.

We figured out how to do good smoothing of Bayesian network classifiers.  The same technique works for decision trees, and in fact beats all known algorithms for smoothing/pruning!

Accurate parameter estimation for Bayesian network classifiers using hierarchical Dirichlet processes”, by François Petitjean, Wray Buntine, Geoffrey I. Webb and Nayyar Zaidi, in Machine Learning, 18th May 2018, DOI 10.1007/s10994-018-5718-0.  Available online at Springer Link.  Presented at ECML-PKDD 2018 in Dublin in September, 2018.



Some research papers on hierarchical models

May 15, 2018

Accurate parameter estimation for Bayesian network classifiers using hierarchical Dirichlet processes”, by François Petitjean, Wray Buntine, Geoffrey I. Webb and Nayyar Zaidi, in Machine Learning, 18th May 2018, DOI 10.1007/s10994-018-5718-0.  Available online at Springer Link.  To be presented at ECML-PKDD 2018 in Dublin in September, 2018.

Abstract This paper introduces a novel parameter estimation method for the probability tables of Bayesian network classifiers (BNCs), using hierarchical Dirichlet processes (HDPs).  The main result of this paper is to show that improved parameter estimation allows BNCs  to outperform leading learning methods such as random forest for both 0–1 loss and RMSE,  albeit just on categorical datasets. As data assets become larger, entering the hyped world of “big”, efficient accurate classification requires three main elements: (1) classifiers with low bias that can capture the fine-detail of large datasets (2) out-of-core learners that can learn from data without having to hold it all in main memory and (3) models that can classify new data very efficiently. The latest BNCs satisfy these requirements. Their bias can be controlled easily by increasing the number of parents of the nodes in the graph. Their structure can be learned out of core with a limited number of passes over the data. However, as the bias is made lower to accurately model classification tasks, so is the accuracy of their parameters’ estimates, as each parameter is estimated from ever decreasing quantities of data. In this paper, we introduce the use of HDPs for accurate BNC parameter estimation even with lower bias. We conduct an extensive set of experiments on 68 standard datasets and demonstrate that our resulting classifiers perform very competitively with random forest in terms of prediction, while keeping the out-of-core capability and superior classification time.
Keywords Bayesian network · Parameter estimation · Graphical models · Dirichlet 19 processes · Smoothing · Classification

“Leveraging external information in topic modelling”, by He Zhao, Lan Du, Wray Buntine & Gang Liu, in Knowledge and Information Systems, 12th May 2018, DOI 10.1007/s10115-018-1213-y.  Available online at Springer Link.  This is an update of our ICDM 2017 paper.

Abstract Besides the text content, documents usually come with rich sets of meta-information, such as categories of documents and semantic/syntactic features of words, like those encoded in word embeddings. Incorporating such meta-information directly into the generative process of topic models can improve modelling accuracy and topic quality, especially in the case where the word-occurrence information in the training data is insufficient. In this article, we present a topic model called MetaLDA, which is able to leverage either document or word meta-information, or both of them jointly, in the generative process. With two data augmentation techniques, we can derive an efficient Gibbs sampling algorithm, which benefits from the fully local conjugacy of the model. Moreover, the algorithm is favoured by the sparsity of the meta-information. Extensive experiments on several real-world datasets demonstrate that our model achieves superior performance in terms of both perplexity and topic quality, particularly in handling sparse texts. In addition, our model runs significantly faster than other models using meta-information.
Keywords Latent Dirichlet allocation · Side information · Data augmentation ·
Gibbs sampling

“Experiments with Learning Graphical Models on Text”, by Joan Capdevila, He Zhao, François Petitjean and Wray Buntine, in Behaviormetrika, 8th May 2018, DOI 10.1007/s41237-018-0050-3.  Available online at Springer Link.  This is work done by Joan Capdevila during his visit to Monash in 2017.

Abstract A rich variety of models are now in use for unsupervised modelling of text documents, and, in particular, a rich variety of graphical models exist, with and without latent variables. To date, there is inadequate understanding about the comparative performance of these, partly because they are subtly different, and they have been proposed and evaluated in different contexts. This paper reports on our experiments with a representative set of state of the art models: chordal graphs, matrix factorisation, and hierarchical latent tree models. For the chordal graphs, we use different scoring functions. For matrix factorisation models, we use different hierarchical priors, asymmetric priors on components. We use Boolean matrix factorisation rather than topic models, so we can do comparable evaluations. The experiments perform a number of evaluations: probability for each document, omni-directional prediction which predicts different variables, and anomaly detection. We find that matrix factorisation performed well at anomaly detection but poorly on the prediction task. Chordal graph learning performed the best generally, and probably due to its lower bias, often out-performed hierarchical latent trees.
Keywords Graphical models · Document analysis · Unsupervised learning ·
Matrix factorisation · Latent variables · Evaluation





Research Career Perspective

March 23, 2018

I was asked to do a general review of my research over the last decade or two, or three, or four … Gawd how long is it.  Lets just say I remember loading punch cards and tape drives, and being the first guy in the computer science class to use a … wait for it … a full screen edit called “vi” … because when I started everyone used a line editor called “ed”.  Students still remark how I effortlessly switch between “vi”, “emacs”, and “perl -pi -e” during work.  Not sure that’s good.

So the slides for the talk are here.    There is a lot more I could have said.



What’s Hot in IT

March 2, 2018

Attended the Whats Hot in IT event held by the Victorian ICT for Women group.  See their Tweets about the event.  Prof. Maria Garcia De La Banda (2nd from left) gave a fabulous overview.



Notes on Determinantal Point Processes

September 11, 2017

I’m giving a tutorial on these amazing processes while in Moscow.  The source “book” for this is of course Alex Kulesza and Ben Taskar’s, “Determinantal Point Processes for Machine Learning”, Foundations and Trends® in Machine Learning: Vol. 5: No. 2–3, pp 123-286, 2012.

If you have an undergraduate in mathematics with loads of multi-linear algebra and real analysis, this stuff really is music for the mind.  The connections and results are very cool.  In my view these guys don’t spend enough time in their intro. on gram matrices, which really is the starting point for everything.  In their online video tutorials they got this right, and lead with these results.

There is also a few interesting connections they didn’t mention.  Anyway, I did some additional lecture notes to give some of the key results mentioned in the long article and elsewhere that didn’t make their tutorial slides.


Advanced Methodologies for Bayesian Networks

August 22, 2017

The 3rd Workshop on Advanced Methodologies for Bayesian Networks was run in Kyoto September 20-22, 2017. The workshop was well organised, and the talks were great. Really good invited talks by great speakers!

I’ll be talking about our (with François Petitjean, Nayyar Zaidi and Geoff Webb) recent work with Bayesian Network Classifiers:

Backoff methods for estimating parameters of a Bayesian network

Various authors have highlighted inadequacies of BDeu type scores and this problem is shared in parameter estimation. Basically, Laplace estimates work poorly, at least because setting the prior concentration is challenging. In 1997, Freidman et al suggested a simple backoff approach for Bayesian network classifiers (BNCs). Backoff methods dominate in in n-gram language models, with modified Kneser-Ney smoothing, being the best known, and a Bayesian variant exists in the form of Pitman-Yor process language models from Teh in 2006. In this talk we will present some results on using backoff methods for Bayes network classifiers and Bayesian networks generally. For BNCs at least, the improvements are dramatic and alleviate some of the issues of choosing too dense a network.

Slides are at the AMBN site, here.  Note I spent a bit of time embellishing my slides with some fabulous historical Japanese artwork!

Software for the system is built on the amazing Chordalysis system of François Petitjean, and the code is available as HierarchicalDirichletProcessEstimation.  Boy, Nayyar and François really can do good empirical work!