Clustering Document Based on Semantic Similarity Using Graph Base Spectral
2022-06
Fifth International Iraqi Conference on Engineering Technology (5th IICETA-2022)_online
The Internet's continued growth has resulted in a significant rise in the amount of electronic text documents. Grouping these materials into meaningful collections has become crucial. The old approach of document compilation based on statistical characteristics and categorization relied on syntactic rather than semantic information. This article introduces a unique approach for classifying texts based on their semantic similarity. This is performed by extracting document summaries from the Wikipedia and IMDB databases and then utilizing the NLTK dictionary to generate them. Following that, a vector space is modelled using TFIDF, and clustering is accomplished using Spectral methods. The results are compared with previews' work.
Semantic Similarity for Document Clustering using TFIDF and K-mean
2022-03
University of zakho department of computer
Clustering is an unsupervised learning problem. Its major task is to collect similar data in a cluster in such a way that the data in the same cluster is more similar to each other than thus in other clusters . Text document clustering term is an important way that converts a large dataset of documents into meaningful clusters in a way that the documents in the same cluster are more similar to each other .However, in traditional text document clustering, the documents were clustering without having a description concept, which means the similarity of concepts were ignored in there that caused unsimilar documents reside in the same cluster . Semantic Web technology plays a major role for solving this problem.