h1

Visiting Helsinki

December 21, 2018

Visited my old workplace, University of Helsinki, where I visited various faculty like Prof. Petri Myllymaki, Teemu Roos, Arto Klami and others. Across the bay is Aalto University with folks like Prof. Sami Kaski.   The machine learning groups in broader Helsinki are very strong, with loads of researchers, quality students, and a very vigorous start-up and high-tech culture.

Gave a talk at Machine Learning Coffee Seminar on 17/12 (which serves porridge as well as coffee), and attended the AI Day on 12/12. The AI Day is organised by the recently convened Finnish Centre for Artifical Intelligence, and the speakers were senior researchers from Finland describing their broader vision and direction, so really interesting stuff and very educational. Lots of creativity in there, for instance combining simulation, machine learning, HCI and cognitive psychology.

h1

Invited talk at ACML in Beijing

October 11, 2018

I’ve given an invited talk at ACML in Beijing November:  see Invited Speakers at the ACML website.

Wray's talk at ACML 2018

I talked about the state of Machine Learning, contrasting the old with the new, and discuss where we may head next.  Moreover, I gave some warnings about some problems we are currently facing.  PDF slides for the talk are here.  Abstract is given below.  Prof. Jun Zhu (Tsinghua U.) has had some similar ideas so we conferred afterwards.

Several of us from Monash went:  in the picture are Ye Zhu, Wray Buntine, Lan Du, Yuan Jin and He Zhao.Monash (past and present) at ACML 2018

Something Old, Something New, Something Borrowed, Something Blue

Something Old: In this talk I will first describe some of our recent work with hierarchical probabilistic models that are not deep neural networks. Nevertheless, these are currently among the state of the art in classification and in topic modelling: k-dependence Bayesian networks and hierarchical topic models, respectively, and both are deep models in a different sense. These represent some of the leading edge machine learning technology prior to the advent of deep neural networks. Something New: On deep neural networks, I will describe as a point of comparison some of the state of the art applications I am familiar with: multi-task learning, document classification, and learning to learn. These build on the RNNs widely used in semi-structured learning. The old and the new are remarkably different. So what are the new capabilities deep neural networks have yielded? Do we even need the old technology? What can we do next? Something Borrowed: to complete the story, I’ll introduce some efforts to combine the two approaches, borrowing from earlier work in statistics.

h1

ECML-PKDD talk on Bayesian network classifiers

September 14, 2018

On the 14th September 2018 I presented the following paper at ECML-PKDD in Dublin.  The slides for the talk are here.

We figured out how to do good smoothing of Bayesian network classifiers.  The same technique works for decision trees, and in fact beats all known algorithms for smoothing/pruning!

Accurate parameter estimation for Bayesian network classifiers using hierarchical Dirichlet processes”, by François Petitjean, Wray Buntine, Geoffrey I. Webb and Nayyar Zaidi, in Machine Learning, 18th May 2018, DOI 10.1007/s10994-018-5718-0.  Available online at Springer Link.  Presented at ECML-PKDD 2018 in Dublin in September, 2018.

 

h1

Fabulous data science tag cloud

June 2, 2018

This comes from PhD student Caitlin Doogan.

Tag Cloud on Data

Tag Cloud on Data by Caitlin Doogan

h1

Graduating MDS students

May 25, 2018

Our first larger batch of MDS students graduating.   Here are some who attended the ceremony.  Really great students!

MDS_GradMay2018

MDS Graduation May 2018

h1

Some research papers on hierarchical models

May 15, 2018

Accurate parameter estimation for Bayesian network classifiers using hierarchical Dirichlet processes”, by François Petitjean, Wray Buntine, Geoffrey I. Webb and Nayyar Zaidi, in Machine Learning, 18th May 2018, DOI 10.1007/s10994-018-5718-0.  Available online at Springer Link.  To be presented at ECML-PKDD 2018 in Dublin in September, 2018.

Abstract This paper introduces a novel parameter estimation method for the probability tables of Bayesian network classifiers (BNCs), using hierarchical Dirichlet processes (HDPs).  The main result of this paper is to show that improved parameter estimation allows BNCs  to outperform leading learning methods such as random forest for both 0–1 loss and RMSE,  albeit just on categorical datasets. As data assets become larger, entering the hyped world of “big”, efficient accurate classification requires three main elements: (1) classifiers with low bias that can capture the fine-detail of large datasets (2) out-of-core learners that can learn from data without having to hold it all in main memory and (3) models that can classify new data very efficiently. The latest BNCs satisfy these requirements. Their bias can be controlled easily by increasing the number of parents of the nodes in the graph. Their structure can be learned out of core with a limited number of passes over the data. However, as the bias is made lower to accurately model classification tasks, so is the accuracy of their parameters’ estimates, as each parameter is estimated from ever decreasing quantities of data. In this paper, we introduce the use of HDPs for accurate BNC parameter estimation even with lower bias. We conduct an extensive set of experiments on 68 standard datasets and demonstrate that our resulting classifiers perform very competitively with random forest in terms of prediction, while keeping the out-of-core capability and superior classification time.
Keywords Bayesian network · Parameter estimation · Graphical models · Dirichlet 19 processes · Smoothing · Classification

“Leveraging external information in topic modelling”, by He Zhao, Lan Du, Wray Buntine & Gang Liu, in Knowledge and Information Systems, 12th May 2018, DOI 10.1007/s10115-018-1213-y.  Available online at Springer Link.  This is an update of our ICDM 2017 paper.

Abstract Besides the text content, documents usually come with rich sets of meta-information, such as categories of documents and semantic/syntactic features of words, like those encoded in word embeddings. Incorporating such meta-information directly into the generative process of topic models can improve modelling accuracy and topic quality, especially in the case where the word-occurrence information in the training data is insufficient. In this article, we present a topic model called MetaLDA, which is able to leverage either document or word meta-information, or both of them jointly, in the generative process. With two data augmentation techniques, we can derive an efficient Gibbs sampling algorithm, which benefits from the fully local conjugacy of the model. Moreover, the algorithm is favoured by the sparsity of the meta-information. Extensive experiments on several real-world datasets demonstrate that our model achieves superior performance in terms of both perplexity and topic quality, particularly in handling sparse texts. In addition, our model runs significantly faster than other models using meta-information.
Keywords Latent Dirichlet allocation · Side information · Data augmentation ·
Gibbs sampling

“Experiments with Learning Graphical Models on Text”, by Joan Capdevila, He Zhao, François Petitjean and Wray Buntine, in Behaviormetrika, 8th May 2018, DOI 10.1007/s41237-018-0050-3.  Available online at Springer Link.  This is work done by Joan Capdevila during his visit to Monash in 2017.

Abstract A rich variety of models are now in use for unsupervised modelling of text documents, and, in particular, a rich variety of graphical models exist, with and without latent variables. To date, there is inadequate understanding about the comparative performance of these, partly because they are subtly different, and they have been proposed and evaluated in different contexts. This paper reports on our experiments with a representative set of state of the art models: chordal graphs, matrix factorisation, and hierarchical latent tree models. For the chordal graphs, we use different scoring functions. For matrix factorisation models, we use different hierarchical priors, asymmetric priors on components. We use Boolean matrix factorisation rather than topic models, so we can do comparable evaluations. The experiments perform a number of evaluations: probability for each document, omni-directional prediction which predicts different variables, and anomaly detection. We find that matrix factorisation performed well at anomaly detection but poorly on the prediction task. Chordal graph learning performed the best generally, and probably due to its lower bias, often out-performed hierarchical latent trees.
Keywords Graphical models · Document analysis · Unsupervised learning ·
Matrix factorisation · Latent variables · Evaluation

 

 

 

h1

The Big Tech Healthcare Invasion

April 25, 2018
koeppel-big-tech-healthcare-invasion-ig

The Big Tech Healthcare Invasion Infographic