Archive for November, 2017


Asian Conference on Machine Learning

November 11, 2017

Heading off to ACML in Seoul to present the paper “A Word Embeddings Informed Focused Topic Model” for PhD student He Zhao.  He is off elsewhere, at ICDM in New Orleans presenting another paper, “MetaLDA: a Topic Model that Efficiently Incorporates Meta Information”.  The MetaLDA algorithm incorporates Boolean side information, beating all others, and the newer WEI-FTM algorithm incorporates general side information but as a focused topic model.  He is a prolific coder, with some of his work on Github.

ACML is getting to be a great conference.  Always great invited talks and tutorials.  A worthy end of semester break for me.

In natural language processing and related fields, it has been shown that the word embeddings can successfully capture both the semantic and syntactic features of words. They can serve as complementary information to topics models, especially for the cases where word co-occurrence data is insufficient, such as with short texts. In this paper, we propose a focused topic model where how a topic focuses on words is informed by word embeddings.  Our models is able to discover more informed and focused topics with more representative words, leading to better modelling accuracy and topic quality. With the data argumentation technique, we can derive an efficient Gibbs sampling algorithm that benefits from the fully local conjugacy of the model.  We conduct extensive experiments on several real world datasets, which demonstrate that our model achieves comparable or improved performance in terms of both perplexity and topic coherence, particularly in handling short text data.
Keywords: Topic Models, Word Embeddings, Short Texts, Data Augmentation