Schedule

Speaker Title Time

Opening Remarks 7:30 - 7:35
Hanna Wallach Overview/survey of topic modeling research 7:35 - 8:10
Eric Xing Modeling Dynamic Network Tomography 8:10 - 8:50
Diane Hu and Lawrence Saul A Probabilistic Topic Model for Music Analysis 8:50 - 9:05

Poster break 9:05 - 9:35
Sean Gerrish and David Blei Modeling Influence in Text Corpora 9:35 - 9:50
Fei-Fei Li From Bag-of-Words to Total Scene Understanding: Evolution of Topic Models in Visual Recognition 9:50 - 10:30

Ski break 10:30 - 3:30
Mark Johnson Topic Models and Adaptor Grammars 3:30 - 4:10
Gabriel Doyle and Charles Elkan Financial Topic Models 4:10 - 4:25
David Sontag and Daniel Roy Complexity of Inference in Topic Models 4:25 - 4:40
David Mimno Reconstructing Pompeian Households 4:40 - 4:55

Poster break 4:55 - 5:25
Thomas Landauer Modeling language learning: some history, commentary and news 5:25 - 6:05

Panel session/Closing remarks 6:05 - 6:30

Dinner at Kypriaki 7:00 - ??

Abstracts for Invited Talks

Modeling Dynamic Network Tomography
Eric Xing, Carnegie Mellon University

A plausible representation of the relational information among entities in dynamic systems such as a social community or a living cell is a stochastic network that is topologically rewiring and semantically evolving over time. While there is a rich literature in modeling static or temporally invariant networks, until recently, little has been done toward modeling the dynamic processes underlying rewiring networks. In this talk, I will present a model-based approach to analyze what we will refer to as the dynamic tomography of such time-evolving networks. This approach builds on a time-evolving mixed membership stochastic blockmodel, which is reminiscent of a dynamic topic model. It offers an intuitive but powerful tool to infer and visualize the semantic underpinnings of each actor, such as its social roles or biological functions, underlying the observed network topologies; and it overcomes a number of limitations of many current network inference techniques. I will show some empirical analyses using our model of a social network between monks, a dynamic email communication network between the Enron employees, and a rewiring gene interaction network of fruit fly collected during its full life cycle. In all cases, our model reveals interesting patterns of the dynamic roles of the actors.

From Bag-of-Words to Total Scene Understanding: Evolution of Topic Models in Visual Recognition
Fei-Fei Li, Stanford University

Starting from the original Bag of Words (BoW) formulation of images, the vision community has come a long way in using topic models to solve visual recognition problems. In this talk, I'll sample a number of representative work by us and others that illustrate this evolution. I will particularly focus on issues that are related to representing and learning high-level visual concepts such as scenes, objects, and pictures-and-words. I will show that by using sophisticated representation of detailed image information, topics models can offer a powerful representation for scene context, object segmentation, annotation, and high-level visual concept understanding. Last but not the least, I will discuss both pros and cons of using topic models for vision.

Topic Models and Adaptor Grammars
Mark Johnson, Brown University

Adaptor grammars are a non-parametric Bayesian extension of Probabilistic Context-Free Grammars that can express a variety of different Hierarchical Dirichlet or Pitman-Yor Processes.  Not surprisingly, Adaptor Grammars are closely related to Topic Models.  After introducing Adaptor Grammars, this talk will focus on the relationship between Adaptor Grammars and Topic Models and describe what they have in common and the ways in which they differ.

Modeling language learning: some history, commentary and news
Thomas Landauer, University of Colorado at Boulder

History: In the 90s,while trying to overcome the vocabulary problem in information retrieval, we discovered that SVD combines words into passages
in much the same way as for humans, thus "Latent Semantic Analysis." A spectrum of past applications will be mentioned.
Commentary: The big difference from TOPICS is their objective functions, how words combine versus how they cluster. A popular misunderstanding is that LSA measures how often words occur together in passages. It doesn't.
The news: A new LSA application measures the separate growth of knowledge for any individual student for every word in a corpus.

Abstracts for Contributed Talks

A Probabilistic Topic Model for Music Analysis
Diane Hu and Lawrence Saul, University of California, San Diego

We describe a probabilistic model for learning musical key-profiles from symbolic and audio files of polyphonic, classical music. Our model is based on Latent Dirichlet Allocation (LDA), a statistical approach for discovering hidden topics in large corpora of text. In our adaptation of LDA, music files play the role of text documents, groups of musical notes play the role of words, and musical key-profiles play the role of topics. We show how these learnt key-profiles can be used to determine the key of a musical piece and track its harmonic modulations.

 Modeling Influence in Text Corpora
Sean Gerrish and David Blei, Princeton University

Identifying the most influential documents in a corpus is an important problem in a wide range of fields, ranging from information science and historiography to text summarization and news aggregation. We propose using changes in the linguistic content of these documents over time to predict the importance of individual documents within the collection and describe a dynamic topic model for both quantifying and qualifying the impact of each document in the corpus.

Financial Topic Models
Gabriel Doyle and Charles Elkan, University of California, San Diego

We apply topic models to financial data to obtain a more accurate view of economic networks than that supplied by traditional economic statistics. The learned topic models can serve as a substitute for or a complement to more complicated network analysis. Initial results on S&P500 stock market data show that topic models are able to obtain meaningful stock categories from unsupervised data and show promise in revealing network-like statistics about the stock market. We also discuss the characteristics of an ideal topic model for financial data.

Complexity of Inference in Topic Models
David Sontag and Daniel Roy, Massachusetts Institute of Technology

We consider the computational complexity of finding the MAP assignment of topics to words in Latent Dirichlet Allocation. We show that, when the effective number of topics per document is small, exact inference takes polynomial time. In contrast, we show that, when a document has a large number of topics, finding the MAP assignment in LDA is NP-hard. Our results motivate further study of the structure in real-world topic models, and raise a number of questions about the requirements for accurate inference during both learning and test-time use of topic models.

Reconstructing Pompeian Households
David Mimno, University of Massachusetts, Amherst

A database of objects discovered in houses in the Roman city of Pompeii provides a unique view of ordinary life in an ancient city. Experts have used this collection to study the structure of Roman households, exploring the distribution and variability of tasks in architectural spaces, but such approaches are necessarily affected by modern cultural assumptions. In this study we present a data-driven approach to household archeology, treating it as an unsupervised labeling problem, that attempts to provide a more objective complement to human interpretation.

Schedule of Poster Sessions

Session 1

  • Spherical Topic Models - Joseph Reisinger, Austin Waters, Bryan Silverthorn, Raymond Mooney
  • Undirected Topic Models - Ruslan Salakhutdinov, Geoffrey Hinton
  • Generating Status Hierarchies from Meeting Transcripts Using the Author-Topic Model - David Broniatowski
  • Software Analysis with Unsupervised Topic Models - Erik Linstead, Lindsey Hughes, Cristina Lopes, Pierre Baldi
  • Adaptation of Topic Model to New Domains Using Recursive Bayes - Ying-Lang Chang, Jen-Tzung Chien
  • Modeling Shared Tastes in Online Communities - Laura Dietz
  • Application of Lexical Topic Models to Protein Interaction Sentence Prediction - Tamara Polajnar, Mark Girolami
  • A Time and Space Dependent Topic Model for Unsupervised Activity Perception in Video - Eric Wang, Lawrence Carin
  • Audio Scene Understanding Using Topic Models - Samuel Kim, Shiva Sundaram, Panayiotis Georgiou, Shrikanth Narayanan
  • Stopwords and Stylometry: A Latent Dirichlet Allocation Approach - Arun R., Saradha R., V. Suresh, C.E. Veni Madhavan, M. Narasimha Murty
  • Learning to Summarize Using Coherence - Pradipto Das, Rohini Srihari
  • Focused Topic Models - Sinead Williamson, Chong Wang, Katherine Heller, David Blei
  • Applications of Topic Models to Analysis of Disaster-Related Twitter Data - Kirill Kireyev, Leysia Palen, Kenneth Anderson
  • A Semantic Question / Answering System Using Topic Models - Asli Celikyilmaz
  • Finding Topics in Emails: Is LDA Enough? - Shafiq Joty, Giuseppe Carenini, Gabriel Murray, Raymond Ng
  • A Probabilistic Topic Model for Music Analysis - Diane Hu, Lawrence Saul

Session 2

  • Topic Models for Audio Mixture Analysis - Paris Smaragdis, Madhusudana Shashanka, Bhiksha Raj
  • Timelines: Revealing the Birth and Evolution of Ideas in Text Stream Using Infinite Dynamic Topic Models - Amr Ahmed, Eric Xing
  • Modeling Concept-Attribute Structure - Joseph Reisinger, Marius Pasca
  • Segmented Topic Model for Text Classification and Speech Recognition - Chuang-Hua Chueh, Jen-Tzung Chien
  • Writer Identification in Offline Handwriting Using Topic Models - Anurag Bhardwaj, Manavender Malgireddy, Venu Govindaraju
  • Implicit Communication Detection Using Topics Model on Asynchronous Communication Data - Charles Panaccione, Peter Folz
  • Topic Modeling for the Social Sciences - Daniel Ramage, Evan Rosen, Jason Chuang, Chris Manning, Daniel McFarland
  • Author Disambiguation: A Nonparametric Topic and Co-authorship Model - Andrew Dai, Amos Storkey
  • Speeding Up Gibbs Sampling by Variable Grouping - Evgeniy Bart
  • Modeling Tag Dependencies in Tagged Documents - Timothy Rubin, America Holloway, Padhraic Smyth, Mark Steyvers
  • Data Portraiture and Topic Models - Aaron Zinman, Doug Fritz
  • Who Talk to Whom: Modeling Latent Structures in Dialogue Documents - Bailu Ding, Jiang-Ming Yang, Chong Wang, Rui Cai, Zhiwei Li, Lei Zhang 
  • Topic Models for Semantically Annotated Document Collections - Markus Bundschus, Volker Tresp, Hans-Peter Kriegel 
  • Complexity of Inference in Topic Models - David Sontag, Daniel M. Roy
  • Modeling Influence in Text Corpora - Sean Gerrish, David Blei