Semantic Similarity for Document Clustering using TFIDF and K-mean
2021-08-20
The constant success of the internet made the number of text documents in
electronic forms increases hugely. The techniques to group these documents into
meaningful clusters are becoming critical missions. Clustering is one of the data
mining techniques, which can be used for mining data by gathering similar data
in groups. The traditional method of clustering documents was based on statistical
features, and the clustering was done using syntactical notion rather than
semantical one. However, these techniques resulted in dis-similar data to be
gathered in the same group due to problems of polysemy and synonymy. Now,
these problems can be solved with clustering based on semantic similarity.
This thesis proposes a system to cluster documents based on semantic
similarity.