Towards Coherent Topic Models
Edwin Bonilla (NICTA)
NICTA SML SEMINARDATE: 2013-06-27
TIME: 11:15:00 - 12:15:00
LOCATION: NICTA - 7 London Circuit
CONTACT: JavaScript must be enabled to display this email address.
ABSTRACT:
Topic models have the potential to improve search and browsing by extracting useful semantic themes from web pages and other text documents. When learned topics are coherent and interpretable, they can be valuable for faceted browsing, results set diversity analysis, and document retrieval. However, when dealing with small collections or noisy text (e.g. web search result snippets or blog posts), learned topics can be less coherent, less interpretable, and less useful. In this talk I will describe two methods to regularize the learning of topic models. Our regularizers work by creating a structured prior over words that reflect broad patterns in the external data. I will show that both regularizers improve topic coherence and interpretability on thirteen datasets while learning a faithful representation of the collection of interest. Overall, this work makes topic models more useful across a broader range of text data.
BIO:
http://users.cecs.anu.edu.au/~Edwin.Bonilla/





