Archive for the ‘conference’ Category

h1

Invited talk at ACML in Beijing

October 11, 2018

I’m giving an invited talk at ACML in Beijing the coming November:  see Invited Speakers at the ACML website.

I’m going to talk about the state of Machine Learning, contrasting the old with the new, and discuss where we may head next.  Moreover, I’ll give some warnings about some problems we are currently facing.  PDF slides for the talk are here.

Something Old, Something New, Something Borrowed, Something Blue

Something Old: In this talk I will first describe some of our recent work with hierarchical probabilistic models that are not deep neural networks. Nevertheless, these are currently among the state of the art in classification and in topic modelling: k-dependence Bayesian networks and hierarchical topic models, respectively, and both are deep models in a different sense. These represent some of the leading edge machine learning technology prior to the advent of deep neural networks. Something New: On deep neural networks, I will describe as a point of comparison some of the state of the art applications I am familiar with: multi-task learning, document classification, and learning to learn. These build on the RNNs widely used in semi-structured learning. The old and the new are remarkably different. So what are the new capabilities deep neural networks have yielded? Do we even need the old technology? What can we do next? Something Borrowed: to complete the story, I’ll introduce some efforts to combine the two approaches, borrowing from earlier work in statistics.

h1

ECML-PKDD talk on Bayesian network classifiers

September 14, 2018

On the 14th September 2018 I presented the following paper at ECML-PKDD in Dublin.  The slides for the talk are here.

We figured out how to do good smoothing of Bayesian network classifiers.  The same technique works for decision trees, and in fact beats all known algorithms for smoothing/pruning!

Accurate parameter estimation for Bayesian network classifiers using hierarchical Dirichlet processes”, by François Petitjean, Wray Buntine, Geoffrey I. Webb and Nayyar Zaidi, in Machine Learning, 18th May 2018, DOI 10.1007/s10994-018-5718-0.  Available online at Springer Link.  Presented at ECML-PKDD 2018 in Dublin in September, 2018.

 

h1

Picking Conferences

January 7, 2018

As a PhD student starting out, you do have some career options.  Likewise, as a typical junior academic, with limited budget and research time, you have similar career options.  A main one which I’ll discuss here is:  Which conference(s) should I got to?  This is peculiar to computer scientists whose conferences are competitive publications (say 20-25% acceptance rate) and count as publications.

So you only get time to attend a few conferences.  Likewise, you only get time to write papers for a few.  So you want them to count.  Conferences each have their own style.  Best way to think of it is that a conference is a tribe where membership is part-time.  You have to take time to learn about the habits and preferences of the tribe, i.e., in terms of paper content.  If the tribe always starts off with 20% of detailed theoretical definitions then you have to as well.  If they do certain kinds of experiments, then so should you.  Think of these sorts of things as tribal markings.  To be innovative, you generally need to do so from inside the system.  I know this sounds conformist, and belief me, I am completely non-conformist myself, but generally its how conferences work, largely as a result of the reviewing system. If a trusted member of the tribe starts quoting classical, venerated philosophers, so will the others.  If a complete unknown person submits a paper quoting venerated philosophers, then it’ll be viewed as weird unless they have enough other tribal markings on their work to accepted.

I have a number of conferences I really like where I understand the general tribal markings and am happy to live with them.  So SIGKDD has solid experimental work, ICML has innovative new methods, ACL has applications of machine learning to real linguistic problems.  They sometimes have additional tribal markings that can be more or less problematic.

Anyway, as a junior academic, you have to target a few conferences and learn to become a reliable tribal member.  You might want to pick a few authors and build on their work.  Or you might want to pick a specialised problem.  Regardless, to publish in particular venues you’ll have to get to know the tribal preferences and adhere to them.  Doing good research is one thing, and really good research will usually speak for itself, but if your contribution is not outstanding, say “merely” at the top 25 percentile of work, then you have to follow the tribe to be accepted into the tribe.  That takes time.

Moreover, the vibe at the conference is always much, much more than the printed proceedings.  You need to be there:  hear the questions, watch the audience, chat to others in the breaks, see the quality of the presenters.  What is important and influential?  What is losing out, perhaps because it was trendy rather than productive?  All this happens at the conference.  You need to be there to see it.  Otherwise, you’ll be a year behind the others … new ideas for next year’s conference are often the germ of an idea at this year’s conference.  Moreover, it always helps to see the movers and shakers in action.  What sort of people are they?  How do they present their work?

So what does this mean to the junior academic?  You need early on to target a particular conference, subject or influential author’s/group’s body of work, and learn what it is they do.  That’ll take time.  So if you don’t see yourself as being involved in that community 5 years down the track, you probably shouldn’t be making that effort.  If you think their research doesn’t have a good future, then again, you probably shouldn’t be making that effort.  Pick some conferences with this in mind, and try and go along semi-regularly to keep track of things and pick up the vibe.

h1

Asian Conference on Machine Learning

November 11, 2017

Heading off to ACML in Seoul to present the paper “A Word Embeddings Informed Focused Topic Model” for PhD student He Zhao.  He is off elsewhere, at ICDM in New Orleans presenting another paper, “MetaLDA: a Topic Model that Efficiently Incorporates Meta Information”.  The MetaLDA algorithm incorporates Boolean side information, beating all others, and the newer WEI-FTM algorithm incorporates general side information but as a focused topic model.  He is a prolific coder, with some of his work on Github.

ACML is getting to be a great conference.  Always great invited talks and tutorials.  A worthy end of semester break for me.

Abstract
In natural language processing and related fields, it has been shown that the word embeddings can successfully capture both the semantic and syntactic features of words. They can serve as complementary information to topics models, especially for the cases where word co-occurrence data is insufficient, such as with short texts. In this paper, we propose a focused topic model where how a topic focuses on words is informed by word embeddings.  Our models is able to discover more informed and focused topics with more representative words, leading to better modelling accuracy and topic quality. With the data argumentation technique, we can derive an efficient Gibbs sampling algorithm that benefits from the fully local conjugacy of the model.  We conduct extensive experiments on several real world datasets, which demonstrate that our model achieves comparable or improved performance in terms of both perplexity and topic coherence, particularly in handling short text data.
Keywords: Topic Models, Word Embeddings, Short Texts, Data Augmentation

h1

Reviewing in Machine Learning

June 11, 2017

A common subject for mutual commiseration in the community is the quality of reviewing.  In huge and specialised conferences like NIPS and ICML, there are so many papers and so many reviewers that generally the match-up between reviewers and papers is quite good, as good as or better than for a journal article.  In smaller conferences, like ACML, and for grant applications in relatively small places like Australia (e.g, the ARC), the match-up can be a lot poorer.  This causes reviewer misunderstandings.

Of course, one needs to be aware of The Great NIPS Reviewing Experiment of 2014.  This is a grand applied statistical experiment that only machine learning folks could think of 😉  I’ll just mention this because it is important to understand that the reviewing process is very challenging, and we as a community are trying our hardest.

Now, I think its very reasonable for some reviewers to not be specialists in the subject of the paper, merely “knowledgable”.  After all, we would like the paper to be readable by more than just the 20 people who focus on that very specific topic.  These non-specialist reviewers generally flag themselves, so the meta-reviewers and authors know to take their comments with a (respectful) grain of salt.  But they can still be excellent in related and broadly applicable areas like experimental methodology and mathematical definitions of models, so they are still an important part of the reviewing ecosystem.  This works when reviewers know their limitations.  Unfortunately, reviewers don’t always do so.

But I still find general aspects of reviewing enlightening.

Case in point is our recent ICML 2017 paper “Leveraging Node Attributes for Incomplete Relational Data”.   Two reviewers said strong accept and one a mild reject.  For the would be rejecter, the method was too simple.  We knew this paper was not full of the usual theoretical complexities expected of an ICML one, of course, so we made sure the experimental work was rock solid.  It was a risk submitting to ICML anyway, as anyone with experience knows the experimental work at ICML can be patchy, its not something generally looked for by reviewers.  If you want quality experimental work in machine learning, go to the knowledge discovery conferences like SIGKDD, certainly not a machine learning conference!

The reason we submitted the paper to ICML was because this simple method beat all previous work handily, either or both in predictive performance or speed.  Simplicity it seems has its advantages, and people should find out about it when it happens.   But, if it was so damn simple, why didn’t someone try it already (in truth, it wasn’t that simple), and given it works so much better, shouldn’t people find out that for this problem all the ICML-ish model complexity of previous methods was unnecessary 😉 .  Now we did add a tricky hierarchical part to our otherwise simple model, just to appease the “meta is better” crowd, and we’re now busy trying to figure out how to add a novel stochastic process (something I love to do).

But unnecessary complexity is something I’m not a big fan of.  My favorite example of this is papers starting off with 2 pages of stochastic process theory before, finally, getting to the model and implementation.  But the model they implement is a truncated one, is completely finite and requires no stochastic process theory to analyse in any way.  In a longer journal format, linking the truncated version with the full stochastic process theory is important to round off the treatment.  In a short format paper with considerable experimental work, details of Levy processes are unnecessary if real non-parametric methods are not actually used in the algorithmic theory.

h1

ICML 2017 paper: Leveraging Node Attributes for Incomplete Relational Data

May 19, 2017

Here is a paper with Ethan Zhao and Lan Du, both of Monash, we’ll present in Sydney.

Relational data are usually highly incomplete in practice, which inspires us to leverage side information to improve the performance of community detection and link prediction. This paper presents a Bayesian probabilistic approach that incorporates various kinds of node attributes encoded in binary form in relational models with Poisson likelihood. Our method works flexibly with both directed and undirected relational networks. The inference can be done by efficient Gibbs sampling which leverages sparsity of both networks and node attributes. Extensive experiments show that our models achieve the state-of-the-art link prediction results, especially with highly incomplete relational data.

As usual, the reviews were entertaining, and some interesting results we didn’t get in the paper.  Its always enlightening doing comparative experiments.

h1

KDD 2014

August 28, 2014

My first KDD conference in a while!

Swapnil Mishra and I had to go to present our paper, “Experiments with Non-parametric Topic Models” (… a link to the ACM page … we paid a lot of money to make this paper available free for all).  The conference slides we presented are here, and the software is available on MLOSS called hca.  The talk on VideoLectures.net is up now.

Also ran into Aaron Li and Alex Smola who spoke after us.  I really have to implement their trick!