Department of Computer Science News
Chibuike Muoh Thesis DefensePosted Nov. 11, 2009
Date: Thursday, November 12, 2009
Place: 274 MSB
Ruoming Jin, Advisor
In this thesis, we tackle the problem of improving the probabilistic topic model used in information retrieval and text mining tasks. It would hold that a critical milestone to developing accurate models lies in understanding the latent relationship between documents. The probabilistic topic language models such as PLSA mark a different approach to natural language modeling than early techniques which used Boolean models.
We observe that although topic models such as PLSA have been used successfully in the text mining community however it is widely known that topic models such as PLSA suffers from an over-fitting problem and in this thesis we propose novel sparsification techniques to improve modeling accuracy while still removing spurious parameters from the model. L0-optimization is a post-processing sparsification approach for language modeling using information theoretic measures. The L2-sparsification approach aims to reformulate the document likelihood equation to simultaneously remove spurious parameters from the model.