Abstract
The statistical characteristics of dimensionality in latent semantic analysis (LSA) space were studied to realize automatic document clustering under different concept levels. It is concluded that dimensionalities corresponding bigger singular values describe commonness among semantic elements, while dimensionalities corresponding smaller ones describe discrepancy. There exists some latent relation between dimensionalities in LSA Space and concept granularities in natural languages. Different dimensionalities of LSA Space are adopted for document clustering under certain concept granularity. Experimental results are in good agreement with the above idea. In addition, in the LSA-based algorithm of document clustering, better clustering precisions are obtained by taking the row vectors of document self-indexing matrix as the objects to be clustered, instead of document vectors with low dimensions.
Original language | English (US) |
---|---|
Pages (from-to) | 1783-1786 |
Number of pages | 4 |
Journal | Qinghua Daxue Xuebao/Journal of Tsinghua University |
Volume | 45 |
Issue number | SUPPL. |
State | Published - Sep 1 2005 |
Externally published | Yes |
Keywords
- Concept granularity
- Document clustering
- Document self-indexing matrix
- Information processing
- Latent semantic analysis
ASJC Scopus subject areas
- General Engineering
- Computer Science Applications
- Applied Mathematics