Rowaida Khaleel Ibrahim

Thesis

2021

Semantic Similarity for Document Clustering using TFIDF and K-mean

2021-08-20

The constant success of the internet made the number of text documents in electronic forms increases hugely. The techniques to group these documents into meaningful clusters are becoming critical missions. Clustering is one of the data mining techniques, which can be used for mining data by gathering similar data in groups. The traditional method of clustering documents was based on statistical features, and the clustering was done using syntactical notion rather than semantical one. However, these techniques resulted in dis-similar data to be gathered in the same group due to problems of polysemy and synonymy. Now, these problems can be solved with clustering based on semantic similarity. This thesis proposes a system to cluster documents based on semantic similarity.