
Visualising a topic model
March 25, 2016Finally decided to write a proper visualiser for topic models. I used the WordCloud Python tool from AMueller[GitHub]. Modified it because the input I needed to use words with precomputed scores, rather than text input. Moreover, I wanted two dimensions for words displayed, size (word frequency in topic) and lightness (degree to which the word is characterised by the topic, measured as frequency over document frequency). I also scale the final tag cloud depending on the size of the topic in the corpus. The correlation between topics is computed from the document-topic proportions. All these then go into GraphViz, where nodes are displayed as images and a lot of careful weighting and organising of the number of topic correlations to display, edge weights, etc.
Below are the results on ABC news articles from their website 2003-2012 collected by Dr. Jinjing Li of NATSEM in Canberra. These images are about 4000 by 4000 pixels. YOU will not be able to view it unless you:
- get on a big screen,
- click on the image to enter image view mode,
- then scroll down to bottom right, click on “View full size” to bring it up,
- and then zoom around to view.
To produce the banking one, I do the following commands with hca:
# generate the topic model into result set B1 hca -Ang -v -K50 -C1000 -q2 bank B1 # compute the diagnostics hca -v -v -V -V -r0 -C0 bank B1 # generate the image topset2word.pl --dot "-Kfdp" --lang png B1 BN1
Leave a Reply