Clustering methods are more challenging than classification methods, because they are conducted in a fully unsupervised way. Moreover, most traditional clustering algorithms cannot be directly used for search result clustering because of some practical issues.
For example, the algorithm should take the document snippets instead of the whole documents as input, since the downloading of original documents is time-consuming. The clustering algorithm should be fast enough for online calculation.
Cluster Documents into topically-coherent groups according to content similarity and generate descriptive summaries for clusters.
However, these summaries are often unreadable, which make it difficult for web users to identify relevant clusters.
STC not treats documents as a collection of words but as a string of words. On that way thus operates using the proximity information between words. STC uses suffix tree structure to efficiently identify sets of documents that share common phrases and terms and uses this information to create clusters and to concisely present their content.
STC meanly includes four logical steps:
STC has linear complexity $O(n)$
Linear complexity puts STC in the top of clustering algorithms. The low complexity return fast search in suffix tree. Because of that reason, this algorithm is used in online clustering.