Skip Navigation
*To search for student contact information, login to FlashLine and choose the "Directory" icon in the FlashLine masthead (blue bar).

Department of Computer Science News

Chibuike Muoh Thesis Defense

Posted Nov. 11, 2009
Sparsification for Topic Modeling and Applications to Information Retrieval

Date: Thursday, November 12, 2009
Time: 2:00
Place: 274 MSB

Committee Members:
Ruoming Jin, Advisor
Yuri Breitbart
Ye Zhao

In this thesis, we tackle the problem of improving the probabilistic topic model used in information retrieval and text mining tasks. It would hold that a critical milestone to developing accurate models lies in understanding the latent relationship between documents. The probabilistic topic language models such as PLSA mark a different approach to natural language modeling than early techniques which used Boolean models.

We observe that although topic models such as PLSA have been used successfully in the text mining community however it is widely known that topic models such as PLSA suffers from an over-fitting problem and in this thesis we propose novel sparsification techniques to improve modeling accuracy while still removing spurious parameters from the model. L0-optimization is a post-processing sparsification approach for language modeling using information theoretic measures. The L2-sparsification approach aims to reformulate the document likelihood equation to simultaneously remove spurious parameters from the model.