Automatic Clustering
From Saltlux
1. Definition of Document Clustering
Document Clustering or Text Clustering is strongly related to Data Clustering.
Dater clustering is a technique to classify data into different groups and to have data in a group hold similarities, or to share the common characteristics.
The similarities between data could be calculated by distance measure.
With the clustering characteristics of search results could be understood.
Document summarization may be used for structurization of non-directional documents, automatic topic extraction, high speed information retrieval or construction of filtering..
2. Description of Document Clustering
Weight characteristics of keyword of each document would be extracted by feature extraction and documents with the characteristical similarity would be clustered.
Web search engine often shows thousands of pages of search results and it becomes difficult to distinguish suitable information. The clustering technology is useful to resolve this issue for users through grouping of retrieved documents into semantic territories.
3. Document Clustering Techniques
This is the unsupervised learning as it classifies documents without any classification knowledge. In other words, cluster is created based on the similarity without knowledge on samples. A cluster means a collection of patterns that the limited number of patterns gather together in a given pattern space and the process procedure is the clustering.
Clustering means division of data into several partial collections and this can be made through the calculation of similarity or proximity of the shared characteristics by distance measuring technique.
When the first search result returned vector values of search results obtained in accordance with characteristical values in each document through TMS engine on the extraction space.
The location of vector values of each document is decided on the space and the locations form cluster.
When each document forms different cluster on a space, the representative keywords are extracted.
When a extracted cluster is divided into several clusters, the cluster forms several clusters and show keywords representing each sub-clusters.
< Case of clusty.com >
<View by topics of KERIS>
