Text Representation
Vector Space model
The ignorance of any relation between words. Learn algorithm could only detect terminology, conceptual patterns (synatatic meaning) are totally ignored.
Dimensionality problem
n-grams model is a contiguous sequence of $n$ items from a given sequence of text or speech. An n-grams model is a type of probabilistic language model for predicting the next item in such a sequence in the form of a (n - 1) other Markov Model
Two advantages of n-gram models: