Diman Siddiq Hassan

Published Journal Articles

2025

THE EFFECT OF FEATURE SELECTION METHODS ON MACHINE LEARNING MODEL PERFORMANCE: A COMPARATIVE STUDY FOR BREAST CANCER PREDICTION

2025-01

Science Journal of University of Zakho (Issue : 2025) (Volume : 13)

Developing countries often face a high incidence of breast cancer, making early detection vital for effective treatment. The risk of developing breast cancer can be evaluated using machine learning methods and regular diagnostic data. In cancer datasets, there is a wealth of patient information, but not all of it is valuable for predicting cancer. This highlights the significance of feature selection methods in uncovering the relevant data. In this field, many studies have attempted to predict the different types of breast tumours, since it is important to diagnose breast cancer medication accurately. This paper aims to perform a comparison such that to show the effect of different feature selection methods on the accuracy of various existing machine learning algorithms. The study focuses on seven machine learning algorithms: K-Nearest Neighbors (KNN), Naive Bayes (NB), Decision Trees (DT), Support Vector Machines (SVM), Logistic Regression (LR), Neural Network (NN), and Random Forest (RF). The feature selection techniques examined include F-test Feature Selection, Mutual Information (MI), and Spearman Correlation Coefficient. The dataset used for the experiments is the Wisconsin Diagnostic Breast Cancer (WDBC) dataset, which is publicly available from the UCI Repository. The findings reveal that when feature selection is implemented, the LR and NN algorithms demonstrate superior accuracy and perform exceptionally well across other metrics compared to the other models

2022

LINK PREDICTION IN CO-AUTHORSHIP NETWORKS: A REVIEW

2022-10

Science Journal of University of Zakho (Issue : 12) (Volume : 10)

Besides social network analysis, the Link-Prediction (LP) problem has useful applications in information retrieval, bioinformatics, telecommunications, microbiology, and e-commerce as a forecast of future links in a given context to find what possible connections are based on a local and global statistical analysis of the given graph data. However, in Academic Social Networks (ASNs), the LP issue has recently attracted a lot of attention in academia and called for a variety of link prediction techniques to predict co-authorship among researchers and to examine the rich structural and associated data. As a result, this study investigates the problem of LPG in ASNs to forecast the upcoming co-authorships among researchers. In a systematic approach, this review presents, analyses, and compares the primary taxonomies of topological-based, content-based, and hybrid-based approaches, which are used for computing similar scores for each pair of unconnected nodes. Then, this study ends with findings on challenges and open problems for the community to work on for further development of the LP problem of scholarly social networks.

Heart disease prediction based on pre-trained deep neural networks combined with principal component analysis

2022-08

Biomedical Signal Processing and Control

Heart Disease (HD) is often regarded as one of the deadliest human diseases. Therefore, early prediction of HD risks is crucial for prevention and treatment. Unfortunately, current clinical procedures for diagnosing HD are costly and often require an expert level of intervention. In response to this issue, researchers have recently developed various intelligent systems for the automated diagnosis of HD. Among the developed approaches, those based on artificial neural networks (ANNs) have gained more popularity due to their promising prediction results. However, to the authors’ knowledge, no research has attempted to exploit ANNs for feature extraction. Hence, research into bridging this gap is worthwhile for more excellent predictions. Motivated by this fact, this research proposes a new approach for HD prediction, utilizing a pre-trained Deep Neural Network (DNN) for feature extraction, Principal Component Analysis (PCA) for dimensionality reduction, and Logistic Regression (LR) for prediction. Cleveland, a publicly accessible HD dataset, was used to investigate the efficacy of the proposed approach (DNN + PCA + LR). Experimental results revealed that the proposed approach performs well on both the training and testing data, with accuracy rates of 91.79% and 93.33%, respectively. Furthermore, the proposed approach exhibited better performance when compared with the state-of-the-art approaches under most of the evaluation metrics used.