Transactions on Machine Learning and Data Mining (ISSN: 1865-6781)
Volume 4 - Number 2 - October 2011 - Pages 55-74
Discovering Text Patterns by a New Graphical Model
M. Huang and R.M. Haralick
Computer Science Department, The Graduate School and University Center, The City University of New York, New York, NY 10016, USA
We discuss a probabilistic graphical model that works for recognizing three types of text patterns in a sentence: noun phrases; the meaning of an ambiguous word; and the semantic arguments of a verb. The model has an unique mathematical expression and graphical representation compared to existing graphical models such as CRFs, HMMs, and MEMMs. In our model, a sequence of optimal categories for a sequence of symbols is determined by finding the optimal category for each symbol independently. Two consequences follow. First, it does not need to employ dynamic programming. The on-line time complexity and memory complexity are reduced. Moreover, the misclassification rate is smaller than that obtained by CRFs, HMMs, or MEMMs. Experiments conducted on standard data sets show good results. The performance of each task surpasses or approaches the state-of-art level.
Download Paper (195 KB)