Next two days, 28th and 29th December I’ll be giving a tutorial at KAIST hosted by Alice Oh. We just flew in last night from visiting Chengdu and Xi’an in China. This is based on the Introduction to Data Science unitt, FIT5145, at Monash.

Insights into components of prior research and aspects of latent academia

On 11th-14th January 2016 I’ll be visiting the School of IT at Monash University Malaysia, which is located within the Bandar Sunway township in Malaysia just outside Kuala Lumpur city. My talk should be on the Monday (11th). The slides are here (available temporarily).

**Title: Introduction to Data Science**

This 2 hour seminar works through some of the emerging highlights of Data Science, reviewing major videos, blogs and articles that helped mold the field. This seminar looks at processes and case studies to understand the many facets of working with data, and the significant effort in Data Science over and above the core task of Data Analysis. So the series is a broad introduction to working with data rather than a deep dive into the world of statistics. The seminar is aimed at those with an IT background who either want to start in Data Science or work with it, for instance in management or as a data engineer. Attendees should have a knowledge of information technology and computer science.

The talk will be extracted from our FIT5145 unit given in the Master of Data Science.

The ML Bootcamp is a joint University of Warwick and Monash University programme organised by PhD students. Really great programme with all sorts of cool stuff in data science. My tutorial is Introducing Document Analysis (pdf slides). This is a “grand tour” tutorial, giving lots of examples rather then properly covering any particular theories or algorithms.

An earlier talk I gave, on a related topic, is Introduction to Text Mining (PDF slides), originally given to a business-technical audience in 2014. So this is more a motivational talk on text mining, why it is useful and why it is difficult.

This is an updated version of the pan-European talk broadened a bit, again to remove the non-parametric minutiae. I was lucky to be visiting Waikato to attend Antti Puurula‘s thesis defense. The PDF slides are here.

TITLE: Non-parametric Methods for Unsupervised Semantic Modelling

ABSTRACT:

This talk will cover some of our recent work in extended topic models to serve as tools in text mining and NLP (and hopefully, later, in IR) when some semantic analysis is required. In some sense our goals are akin to the use of Latent Semantic Analysis. The basic theoretical/algorithmic tool we have for this is non-parametric Bayesian methods for reasoning on hierarchies of probability vectors. The concepts will be introduced but not the statistical detail. Then I’ll present some of our KDD 2014 paper (Experiments with Non-parametric Topic Models) that is currently the best performing topic model by a number of metrics.

Here’s a good deep-dive talk by Zoubin Ghahramani from Cambridge, slides only in PDF.

This Sydney 2015 MLSS summer school is organised by Edwin Bonilla and held in Sydney Feb 16-25th. My tutorial is titled “Models for Probability/Discrete Vectors with Bayesian Non-parametric Methods.” My final version of the slides is here in PDF.

The talk I gave at JSI (Jozef Stefan Institute in Ljublana) on 14th Jan 2015 was recorded. The group here, with Dunja Mladenić and Marko Grobelnik, are expert in areas like Data Science and Text Mining, but they’re not into Bayesian non-parametrics, so in this version of the talk I mostly avoided the statistical details and talked more about what we did and why. The talk is up on Video Lectures. The original PDF of the EU talk sequence was on this post.