| English | Arabic | Home | Login |

Published Journal Articles

2024

Enhanced Conjugate Gradient Method for Unconstrained Optimization and Its Application in Neural Networks

2024-10
European Journal of Pure and Applied Mathematics (Issue : 4) (Volume : 17)
In this study, we present a novel conjugate gradient method specifically designed for addressing with unconstrained optimization problems. Traditional conjugate gradient methods have shown effectiveness in solving optimization problems, but they may encounter challenges when dealing with unconstrained problems. Our method addresses this issue by introducing modifications that enhance its performance in the unconstrained setting. We demonstrate that, under certain conditions, our method satisfies both the sufficient descent criteria and establishes global convergence, ensuring progress towards the optimal solution at each iteration. Moreover, we establish the global convergence of our method, providing confidence in its ability to find the global optimum. To showcase the practical applicability of our approach, we apply this novel method to a dataset, applying a feed-forward neural network value estimation for continuous trigonometric function value estimation. To evaluate the efficiency and effectiveness of our modified approach, we conducted numerical experiments on a set of well-known test functions. These experiments reveal that our algorithm significantly reduces computational time due to its faster convergence rates and increased speed in directional minimization. These compelling results highlight the advantages of our approach over traditional conjugate gradient methods in the context of unconstrained optimization problems.

Bitcoin Price Prediction Using Deep Bayesian LSTM With Uncertainty Quantification: A Monte Carlo Dropout–Based Approach

2024-09
Stat (Issue : 3) (Volume : 13)
Bitcoin, being one of the most triumphant cryptocurrencies, is gaining increasing popularity online and is being used in a variety of transactions. Recently, research on Bitcoin price predictions is receiving more attention, and researchers have investigated the various state-of-the-art machine learning (ML) and deep learning (DL) models to predict Bitcoin price. However, despite these models providing promising predictions, they consistently exhibit uncertainty, which cannot be adequately quantified by classical ML models alone. Motivated by the enormous success of applying Bayesian approaches in several disciplines of ML and DL, this study aims to use Bayesian methods alongside Long Short-Term Memory (LSTM) to predict the closing Bitcoin price and consequently measure the uncertainty of the prediction model. Specifically, we adopted the Monte Carlo dropout (MC-Dropout) method with the Bayesian LSTM model to quantify the epistemic uncertainty of the model's predictions and provided confidence intervals for the predicted outputs. Experimental results showed that the proposed model is efficient and outperforms other state-of-the-art models in terms of root mean square error (RMSE), mean absolute error (MAE) and R2. Thus, we believe that these models may assist the investors and traders in making critical decisions based on short-term predictions of Bitcoin price. This study illustrates the potential benefits of utilizing Bayesian DL approaches in time series analysis to improve data prediction accuracy and reliability.

Leveraging Bayesian deep learning and ensemble methods for uncertainty quantification in image classification: A ranking-based approach

2024-01
Heliyon (Issue : 2) (Volume : 10)
Bayesian deep learning (BDL) has emerged as a powerful technique for quantifying uncertainty in classification tasks, surpassing the effectiveness of traditional models by aligning with the probabilistic nature of real-world data. This alignment allows for informed decision-making by not only identifying the most likely outcome but also quantifying the surrounding uncertainty. Such capabilities hold great significance in fields like medical diagnoses and autonomous driving, where the consequences of misclassification are substantial. To further improve uncertainty quantification, the research community has introduced Bayesian model ensembles, which combines multiple Bayesian models to enhance predictive accuracy and uncertainty quantification. These ensembles have exhibited superior performance compared to individual Bayesian models and even non-Bayesian counterparts. In this study, we propose a novel approach that leverages the power of Bayesian ensembles for enhanced uncertainty quantification. The proposed method exploits the disparity between predicted positive and negative classes and employes it as a ranking metric for model selection. For each instance or sample, the ensemble’s output for each class is determined by selecting the top ‘k’ models based on this ranking. Experimental results on different medical image classifications demonstrate that the proposed method consistently outperforms or achieves comparable performance to conventional Bayesian ensemble. This investigation highlights the practical application of Bayesian ensemble techniques in refining predictive performance and enhancing uncertainty evaluation in image classification tasks.
2023

Intelligent Home: Empowering Smart Home with Machine Learning for User Action Prediction

2023-08
Science Journal of University of Zakho (Issue : 3) (Volume : 11)
Smart homes is an emerging technology that is transforming the way people live and interact with their homes. These homes are equipped with various devices and technologies that allow the homeowner to control, monitor, and automate various aspects of their home. This can include lighting, heating and cooling, security systems, and appliances. However, to enhance the efficiency of these homes, machine learning algorithms can be utilized to analyze the data generated from the home environment and adapt to user behaviors. This paper proposes a smart home system empowered by machine learning algorithms for enhanced user behavior prediction and automation. The proposed system is composed of three modes, including manual, automatic, and intelligent, with the objectives of maximizing security, minimizing human effort, reducing power consumption, and facilitating user interaction. The manual mode offers control and monitoring capabilities through a web-based user interface, accessible from anywhere and at any time. The automatic mode provides security alerts and appliances control to minimize human intervention. Additionally, the intelligent mode employs machine learning classification algorithms, such as decision tree, K-nearest neighbors, and multi-layer perceptron, to track and predict user actions, thereby reducing user intervention and providing additional comfort to homeowners. Experiments conducted employing the three classifiers resulted in accuracies of 97.4%, 97.22%, and 97.36%, respectively. The proposed smart home system can potentially enhance the quality of life for homeowners while reducing energy consumption and increasing security.

Uncertainty Quantification for MLP-Mixer Using Bayesian Deep Learning

2023-04
Applied Sciences (Issue : 7) (Volume : 13)
Convolutional neural networks (CNNs) have become a popular choice for various image classification applications. However, the multi-layer perceptron mixer (MLP-Mixer) architecture has been proposed as a promising alternative, particularly for large datasets. Despite its advantages in handling large datasets and models, MLP-Mixer models have limitations when dealing with small datasets. This study aimed to quantify and evaluate the uncertainty associated with MLP-Mixer models for small datasets using Bayesian deep learning (BDL) methods to quantify uncertainty and compare the results to existing CNN models. In particular, we examined the use of variational inference and Monte Carlo dropout methods. The results indicated that BDL can improve the performance of MLP-Mixer models by 9.2 to 17.4% in term of accuracy across different mixer models. On the other hand, the results suggest that CNN models tend to have limited improvement or even decreased performance in some cases when using BDL. These findings suggest that BDL is a promising approach to improve the performance of MLP-Mixer models, especially for small datasets.

Bayesian Deep Learning Applied to LSTM Models for Predicting COVID-19 Confirmed Cases in Iraq

2023-04
Science Journal of University of Zakho (Issue : 2) (Volume : 11)
The COVID-19 pandemic has had a huge impact on populations around the world and has caused critical problems to medical systems. With the increasing number of COVID-19 infections, research has focused on forecasting the confirmed cases to make the right medical decisions. Despite the huge number of studies conducted to forecast the COVID-19 patients, the use of Deep Learning (DL) and Bayesian DL models are limited in this field in Iraq. Therefore, this research aims to predict the confirmed cases of COVID-19 in Iraq using classical DL models such as, Long-Short-Term-Memory (LSTM) and Bayesian LSTM models. In this study, Bayesian Deep Learning (BDL) using LSTM models was used to predict COVID-19 confirmed cases in Iraq. The motivation behind using BDL models is that they are capable to quantify model uncertainty and provide better results without overfitting or underfitting. A Monte Carlo (MC) Dropout, which is an approximation method, is added to the Bayesian-LSTM to create numerous predictions for each instance and evaluate epistemic uncertainty. To evaluate the performance of our proposed models, four evaluation measures (MSE, RMSE, R2, MAE) were used. Experimental results showed that the proposed models were efficient and provided an R2 of 0.93 and 0.92, for vanilla LSTM and Bayesian-LSTM, respectively. Furthermore, the two proposed models were optimized using ADAM and SGD optimizers, with the results revealing that optimizing with ADAM provided more accurate results. Thus, we believe that these models may assist the government in making critical decisions based on short-term predictions of confirmed cases in Iraq.

Lightweight deep CNN-based models for early detection of COVID-19 patients from chest X-ray images

2023-03
Expert Systems with Applications
Hundreds of millions of people worldwide have recently been infected by the novel Coronavirus disease (COVID-19), causing significant damage to the health, economy, and welfare of the world's population. Moreover, the unprecedented number of patients with COVID-19 has placed a massive burden on healthcare centers, making timely and rapid diagnosis challenging. A crucial step in minimising the impact of such problems is to automatically detect infected patients and place them under special care as quickly as possible. Deep learning algorithms, such as Convolutional Neural Networks (CNN), can be used to meet this need. Despite the desired results, most of the existing deep learning-based models were built on millions of parameters (weights), which are not applicable to devices with limited resources. Inspired by such fact, in this research, we developed two new lightweight CNN-based diagnostic models for the automatic and early detection of COVID-19 subjects from chest X-ray images. The first model was built for binary classification (COVID-19 and Normal), whereas the second one was built for multi-class classification (COVID-19, viral pneumonia, or normal). The proposed models were tested on a relatively large dataset of chest X-ray images, and the results showed that the accuracy rates of the 2- and 3-class-based classification models are 98.55% and 96.83%, respectively. The results also revealed that our models achieved competitive performance compared with the existing heavyweight models while significantly reducing cost and memory requirements for computing resources. With these findings, we can indicate that our models are helpful to clinicians in making insightful diagnoses of COVID-19 and are potentially easily deployable on devices with limited computational power and resources.

Application of Optimisation Algorithms to Engineering Design Problems and Discrepancies in Mathematical Formulas

2023-03
Applied Soft Computing (Volume : 140)
Engineering design optimisation problems have attracted the attention of researchers since they appeared. Those who work on developing optimisation algorithms, in particular, apply their developed algorithms to these problems in order to test their new algorithms’ capabilities. The mathematical discrepancy emerges during the implementation of equations and constraints related to these engineering problems. This is due to an error occurring in writing or transmitting these equations from one paper to another. Maintaining these discrepancies will have a negative impact on the assessment and model performance verification of the newly developed algorithms, as well as the decision-making process. To address this issue, this study investigates the mathematical discrepancies occurred by researchers in four well-known engineering design optimisation problems (Welded Beam Design WBD, Speed Reducer Design SRD, Cantilever Beam Design CBD, and Multiple Disk Clutch Brake Design MDCBD). We have investigated some of the recently published papers in the literature, identifying discrepancies in their mathematical formulas, and fixing them appropriately by referring and comparing them to the original problem. Furthermore, all mathematical discrepancies , references, parameters, cost functions, constraints, and constraint errors are highlighted, arranged and organised in tables. As a result, this work can help readers and researchers avoid being confused and wasting time when working on these engineering design optimisation problems.

Bayesian Deep Learning Methods Applied to Diabetic Retinopathy Disease: A Review

2023-01
Indonesian Journal of Electrical Engineering and Computer Science (Issue : 2) (Volume : 30)
Diabetic retinopathy (DR) is a complication of diabetes that cause retinal damage; therefore, it is a leading cause of blindness. However, early detection of this disease can dramatically reduce the risk of vision loss. The main problem of early DR detection is that the manual diagnosis by ophthalmology is time-consuming, expensive, and prone to misdiagnosis. Deep learning (DL) models have aided in the early diagnosis of DR, and DL is now frequently utilized in DR detection and classification. The main issues with classical DL models are that they are incapable to quantify the uncertainty in the models, thus they are prone to make wrong decisions in complex cases. However, Bayesian deep learning (BDL) models have recently evolved as unified probabilistic framework to integrate DL and Bayesian models to provides an accurate framework to identify all sources of uncertainty in the model. This paper introduces BDL and most recent research that used BDL approaches to treat diabetic retinopathy are reviewed and discussed. A thorough comparison of the existing Bayesian approaches in this topic is also presented. In addition, available datasets for the fundus retina, which is often employed in DR, are provided and reviewed. Keywords: Bayesian deep learning Diabetic retinopathy Markov Chain Monte Carlo MC dropout Variational inference This is an open access article under the CC BY-SA license.
2022

Analysis and Classification of Autism Data Using Machine Learning Algorithms

2022-11
Science Journal of University of Zakho (Issue : 4) (Volume : 10)
Autism is a neurodevelopment disorder that affects children worldwide between the ages of 2 and 8 years. Children with autism have communication and social difficulties, and the current standardized clinical diagnosis of autism still relies on behavior-based tests. The rapidly growing number of autistic patients in the Kurdistan Region of Iraq necessitates. However, such data are scarce, making extensive evaluations of autism screening procedures more difficult. For this purpose, the use of machine learning algorithms for this disease to assist health practitioners if formal clinical diagnosis should be pursued was investigated. Data from 515 patients were collected in Dohuk city related to autism screening for young children. Three classification algorithms, namely (DT, KNN, and ANN) were applied to diagnose and predict autism using various rating scales. Before applying the above classifiers, the newly obtained data set was in different ways undergo data reprocessing. Since our data is unbalanced with high dimensionality, we suggest combining SMOTE (Synthetic Minority Hyper Sampling Technique) and PCA (Primary Component Analysis) to improve the performance of classification models. Experimental results showed that the combination of PCA and SMOTE methods improved classification performance. Moreover, ANN exceeded the other models in terms of accuracy and F1 score, suggesting that these classification methods could be used to diagnose autism in the future.

Smart Homes for Disabled People: A Review Study

2022-11
Indonesian Journal of Electrical Engineering and Computer Science (Issue : 4) (Volume : 10)
The field of smart homes has gained notable attention from both academia and industry. The majority of the work has been directed at regular users, and less attention has been placed on users with special needs, particularly those with mobility problems or quadriplegia. Brain computer interface has started the mission of helping people with special needs in smart homes by developing an environment that allows them to make more independent decisions. This study investigates the efforts made in the literature for smart homes that have been established to manage and control home components by disabled people and makes a comparison between the reviewed papers, in terms of the controlled devices, the central controller, the people with disabilities the system is meant for, whether or not machine learning was used in the system, and the system's command method. In the field of machine learning-based smart homes for disabled people, the limitations have been pointed out and talked about. Current challenges and possible future directions for further progress have also been given.

Heart disease prediction based on pre-trained deep neural networks combined with principal component analysis

2022-08
Biomedical Signal Processing and Control
Heart Disease (HD) is often regarded as one of the deadliest human diseases. Therefore, early prediction of HD risks is crucial for prevention and treatment. Unfortunately, current clinical procedures for diagnosing HD are costly and often require an expert level of intervention. In response to this issue, researchers have recently developed various intelligent systems for the automated diagnosis of HD. Among the developed approaches, those based on artificial neural networks (ANNs) have gained more popularity due to their promising prediction results. However, to the authors’ knowledge, no research has attempted to exploit ANNs for feature extraction. Hence, research into bridging this gap is worthwhile for more excellent predictions. Motivated by this fact, this research proposes a new approach for HD prediction, utilising a pre-trained Deep Neural Network (DNN) for feature extraction, Principal Component Analysis (PCA) for dimensionality reduction, and Logistic Regression (LR) for prediction. Cleveland, a publicly accessible HD dataset, was used to investigate the efficacy of the proposed approach (DNN + PCA + LR). Experimental results revealed that the proposed approach performs well on both the training and testing data, with accuracy rates of 91.79% and 93.33%, respectively. Furthermore, the proposed approach exhibited better performance when compared with the state-of-the-art approaches under most of the evaluation metrics used.

A Review on Deep Sequential Models for Forecasting Time Series Data

2022-06
Applied Computational Intelligence and Soft Computing (Volume : 2022)
Deep sequential (DS) models are extensively employed for forecasting time series data since the dawn of the deep learning era, and they provide forecasts for the values required in subsequent time steps. DS models, unlike other traditional statistical models for forecasting time series data, can learn hidden patterns in temporal sequences and have the memorizing data from prior time points. Given the widespread usage of deep sequential models in several domains, a comprehensive study describing their applications is necessary. This work presents a comprehensive review of contemporary deep learning time series models, their performance in diverse domains, and an investigation of the models that were employed in various applications. Three deep sequential models, namely, artificial neural network (ANN), long short-term memory (LSTM), and temporal-conventional neural network (TCNN) along with their applications for forecasting time series data, are elaborated. We showed a comprehensive comparison between such models in terms of application fields, model structure and activation functions, optimizers, and implementation, with a goal of learning more about the optimal model used. Furthermore, the challenges and perspectives of future development of deep sequential models are presented and discussed. We conclude that the LSTM model is widely employed, particularly in the form of a hybrid model, in which the most accurate predictions are made when the shape of hybrids is used as the model.

Person-independent facial expression recognition based on the fusion of HOG descriptor and cuttlefish algorithm

2022-03
Multimedia Tools and Applications (Issue : 8) (Volume : 81)
This paper proposes an efficient approach for person-independent facial expression recognition based on the fusion of Histogram of Oriented Gradients (HOG) descriptor and Cuttlefish Algorithm (CFA). The proposed approach employs HOG descriptor due to its outstanding performance in pattern recognition, which results in features that are robust against small local pose and illumination variations. However, it produces some irrelevant and noisy features that slow down and degrade the classification performance. To address this problem, a wrapper-based feature selector, called CFA, is used. This is because CFA is a recent bio-inspired feature selection algorithm, which has been shown to effectively select an optimal subset of features while achieving a high accuracy rate. Here, support vector machine classifier is used to evaluate the quality of the features selected by the CFA. Experimental results validated the effectiveness of the proposed approach in attaining a high recognition accuracy rate on three widely adopted datasets: CK+ (97.86%), RaFD (95.15%), and JAFFE (90.95%). Moreover, the results also indicated that the proposed approach yields competitive or even superior results compared to state-of-the-art approaches.

A Review on Bayesian Deep Learning in Healthcare: Applications and Challenges

2022-03
IEEE Access
In the last decade, Deep Learning (DL) has revolutionised the use of artificial intelligence, and it has been deployed in different fields of healthcare applications such as image processing, natural language processing, and signal processing. DL models have also been intensely used in different tasks of healthcare such as disease diagnostics and treatments. Deep learning techniques have surpassed other machine learning algorithms and proved to be the ultimate tools for many state-of-the-art applications. Despite all that success, classical deep learning has limitations and their models tend to be very confident about their predicted decisions because it does not know when it makes mistake. For the healthcare field, this limitation can have a negative impact on models predictions since almost all decisions regarding patients and diseases are sensitive. Therefore, Bayesian deep learning (BDL) has been developed to overcome these limitations. Unlike classical DL, BDL uses probability distributions for the model parameters, which makes it possible to estimate the whole uncertainties associated with the predicted outputs. In this regard, BDL offers a rigorous framework to quantify all sources of uncertainties in the model. This study reviews popular techniques of using Bayesian deep learning with their benefits and limitations. It also reviewed recent deep learning architecture such as Convolutional Neural Networks and Recurrent Neural Networks. In particular, the applications of Bayesian deep learning in healthcare have been discussed such as its use in medical imaging tasks, clinical signal processing, medical natural language processing, and electronic health records. Furthermore, this paper has covered the deployment of Bayesian deep learning for some of the widespread diseases. This paper has also discussed the fundamental research challenges and highlighted some research gaps in both the Bayesian deep learning and healthcare perspective.
2021

A Review on Using ANOVA and RSM Modelling in The Glass Powder Replacement of The Concrete Ingredients

2021-08
Journal of Applied Science and Technology Trends (Issue : 2) (Volume : 2)
The flat glass powder usage instead of sand is convenient in structurally serviceable and environmentally compatible concrete. The deposits of glass powder in fibres cement compounds manufacture may add significant technical, economic and environmental necessities. The cement material and cement replacement by glass powder is chosen as parameters of the concrete. When the waste glass is fined to very fine dust, it demonstrates a cementitious characteristic due to silica content. Statistical methods and techniques are heavily used in glass powder replacement. In this paper, fifteen papers are reviewed and investigated to check the availability of using the statistical and modelling system in discussing the glass powder replacement with some other ingredients results between 2012-2021. We found that most of the papers depended on the ANOVA test to perform their work. Moreover, central composite face-centred (CFC) and Response Surface Methodology (RSM) took a part in the studies. From the numerous replicas, a quadratic prototypical was supplied with waste glass powder in the numbers of the studies that the glass waste powder is the best with its characteristics.

Combination of PCA with SMOTE Oversampling for Classification of High-Dimensional Imbalanced Data

2021-07
Bitlis Eren Üniversitesi Fen Bilimleri Dergisi (Issue : 3) (Volume : 10)
Imbalanced data classification is a common issue in data mining where the classifiers are skewed towards the larger data class. Classification of high-dimensional skewed (imbalanced) data is of great interest to decision-makers as it is more difficult to. The dimension reduction method, a process in which variables are reduced, allows high dimensional datasets to be interpreted more easily with a certain loss. This study, a method combining SMOTE oversampling with principal component analysis is proposed to solve the imbalance problem in high dimensional data. Three classification algorithms consisting of Logistic Regression, K-Nearest Neighbor, Decision Tree methods and two separate datasets were utilized to evaluate the suggested method's efficacy and determine the classifiers' performance. Respectively, raw datasets, converted datasets by PCA, SMOTE and SMOTE+PCA (SMOTE and PCA) methods, were analyzed with the given algorithms. Analyzes were made using WEKA. Analysis results suggest that almost all classification algorithms improve their classification performance using PCA, SOMTE, and SMOTE+PCA methods. However, the SMOTE method gave more efficient results than PCA and PCA+SMOTE methods for data rebalancing. Experimental results also suggest that K-Nearest Neighbor classifier provided higher classification performance compared to other algorithms.

Oversampling Method Based on Gaussian Distribution and K-Means Clustering

2021-06
Computers, Materials and Continua (Issue : 1) (Volume : 69)
Learning from imbalanced data is one of the greatest challenging problems in binary classification, and this problem has gained more importance in recent years. When the class distribution is imbalanced, classical machine learning algorithms tend to move strongly towards the majority class and disregard the minority. Therefore, the accuracy may be high, but the model cannot recognise data instances in the minority class to classify them, leading to many misclassifications. Different methods have been proposed in the literature to handle the imbalance problem, but most are complicated and tend to simulate unnecessary noise. In this paper, we propose a simple oversampling method based on Multivariate Gaussian distribution and K-means clustering, called GK-Means. The new method aims to avoid generating noise and control imbalances between and within classes. Various experiments have been carried out with six classifiers and four oversampling methods. Experimental results on different imbalanced datasets show that the proposed GK-Means outperforms other oversampling methods and improves classification performance as measured by F1-score and Accuracy.

Face Recognition Based on Gabor Feature Extraction Followed by FastICA and LDA

2021-04
Computers, Materials and Continua (Issue : 2) (Volume : 68)
Over the past few decades, face recognition has become the most effective biometric technique in recognising people's identity, as it is widely used in many areas of our daily lives. However, it is a challenging technique since facial images vary in rotations, expressions, and illuminations. To minimize the impact of these challenges, exploiting information from various feature extraction methods is recommended since one of the most critical tasks in face recognition system is the extraction of facial features. Therefore, this paper presents a new approach to face recognition based on the fusion of Gabor-based feature extraction, Fast Independent Component Analysis (Fas-tICA), and Linear Discriminant Analysis (LDA). In the presented method, first, face images are transformed to grayscale and resized to have a uniform size. After that, facial features are extracted from the aligned face image using Gabor, FastICA, and LDA methods. Finally, the nearest distance classifier is utilised to recognise the identity of the individuals. Here, the performance of six distance classifiers, namely Euclidean, Cosine, Bray-Curtis, Mahalanobis, Correlation, and Manhattan, are investigated. Experimental results revealed that the presented method attains a higher rank-one recognition rate compared to the recent approaches in the literature on four benchmarked face datasets: ORL, GT, FEI, and Yale. Moreover, it showed that the proposed method not only helps in better extracting the features but also in improving the overall efficiency of the facial recognition system.

RULE GENERATION BASED ON MODIFIED CUTTLEFISH ALGORITHM FOR INTRUSION DETECTION SYSTEM

2021-04
Uludağ University Journal of The Faculty of Engineering (Issue : 1) (Volume : 26)
Nowadays, with the rapid prevalence of networked machines and Internet technologies, intrusion detection systems are increasingly in demand. Consequently, numerous illicit activities by external and internal attackers need to be detected. Thus, earlier detection of such activities is necessary for protecting data and information. In this paper, we investigated the use of the Cuttlefish optimization algorithm as a new rule generation method for the classification task to deal with the intrusion detection problem. The effectiveness of the proposed method was tested using KDD Cup 99 dataset based on different evaluation methods. The obtained results were also compared with the results obtained by some classical well-known algorithms namely Decision Tree (DT), Naïve Bayes (NB), Support Vector Machine (SVM), and K-Nearest Neighbourhood (K-NN). Our experimental results showed that the proposed method demonstrates a good classification performance and provides significantly preferable results when compared with the performance of other traditional algorithms. The proposed method produced 93.9%, 92.2%, and 94.7% in terms of precision, recall, and area under curve, respectively.
2020

Improving Classification Performance for a Novel Imbalanced Medical Dataset using SMOTE Method

2020-06
International Journal of Advanced Trends in Computer Science and Engineering (Issue : 3) (Volume : 9)
In recent decades, machine learning algorithms have been used in different fields; one of the most used fields is the health sector. Biomedical data are usually extensive in size, and very hard to be analyzed and interpreted by humans. For this purpose, machine learning models are used to extract hidden patterns from data. In this paper, we aim to analyze, diagnose, and classify diabetes patients using six machine learning algorithms for a new real diabetes dataset. The newly created dataset, called ZADA, is obtained from medical records of about 7000 patients in Zakho city, Kurdistan Region of Iraq. However, our new dataset is imbalanced, which means one class is the minority and the other one is the majority. Class imbalance is a challenging problem in classification, especially in the two-class dataset. When class distributions are imbalanced, traditional machine learning methods often give low classification performance for unseen samples of the minority class. This is because the model tends to be strongly directed by the majority class. To overcome these problems, we first examine the impact of the imbalanced data on the classification performance and hence, use a resampling method to rebalance the data. Furthermore, we utilized three normalization techniques as a preprocessing step to further improve the performance of machine learning algorithms' performance. Therefore, we propose a classification analysis based on the three normalization methods along with the resampling (SMOTE) method to tackle the class imbalance problem. Various experiments are conducted to find the best algorithm with the best performance according to the distribution of minority classes. Results show that the resampling m​e​t​h​o​

A Fully Bayesian Logistic Regression Model for Classification of Zada Diabetes Dataset

2020-06
Science Journal of University of Zakho (SJUOZ) (Issue : 3) (Volume : 8)
Classification of diabetes data with existing data mining and machine learning algorithms is challenging and the predictions are not always accurate. We aim to build a model that effectively addresses these challenges (misclassification) and can accurately diagnose and classify diabetes. In this study, we investigated the use of Bayesian Logistic Regression (BLR) for mining such data to diagnose and classify various diabetes conditions. This approach is fully Bayesian suited for automating Markov Chain Monte Carlo (MCMC) simulation. Using Bayesian methods in analyzing medical data is useful because of the rich hierarchical models, uncertainty quantification, and prior information they provide. The analysis was done on a real medical dataset created for 909 patients in Zakho city with a binary class label and seven independent variables. Three different prior distributions (Gaussian, Laplace, and Cauchy) were investigated for our proposed model implemented by MCMC. The performance and behavior of the Bayesian approach were illustrated and compared with the traditional classification algorithms on this dataset using 10-fold cross-validation. Experimental results show overall that classification under BLR with informative Gaussian priors performed better in terms of various accuracy metrics. It provides an accuracy of 92.53%, a recall of 94.85%, a precision of 91.42%, and an F1 score of 93.11%. Experimental results suggest that it is worthwhile to explore the application of BLR to predictive modeling tasks in medical studies using informative prior distributions.
2019

A simple Bayesian approach to tree-ring dating

2019-03
Archaeometry - University of Oxford (Issue : 4) (Volume : 61)
Tree‐ring dating involves matching sequences of ring widths from undated timbers to dated sequences known as ‘master’ chronologies. Conventionally, the undated timbers (from a building or woodland) are sequentially matched against one another, using t‐tests to identify the relative offsets with the ‘best’ match, thus producing a ’site’ chronology. A date estimate is obtained when this is matched to a local master chronology of known calendar age. Many tree‐ring sequences in the UK produce rather low t‐values and are thus declared not to have a ‘best’ match to a master chronology. Motivated by this and the routine use of Bayesian statistical methods to provide a probabilistic approach to radiocarbon dating, this paper investigates the practicality of Bayesian dendrochronology. We explore a previously published model for the relationship between ring widths and the underlying climatic signal, implementing it within the Bayesian framework via a simulation‐based approach. Probabilities for a match at each offset are produced, removing the need to identify a single ‘best’ match. The Bayesian model proves successful at matching in both simulated and real examples.

Back