| English | Arabic | Home | Login |

Published Journal Articles

2024

A Remedial Measure of Multicollinearity in Multiple Linear Regression in the Presence of High Leverage Points

2024-04
Sains Malaysiana (Issue : 4) (Volume : 53)
The ordinary least squares (OLS) is the widely used method in multiple linear regression model due to tradition and its optimal properties. Nonetheless, in the presence of multicollinearity, the OLS method is inefficient because the standard errors of its estimates become inflated. Many methods have been proposed to remedy this problem that include the Jackknife Ridge Regression (JAK). However, the performance of JAK is poor when multicollinearity and high leverage points (HLPs) which are outlying observations in the X- direction are present in the data. As a solution to this problem, Robust Jackknife Ridge MM (RJMM) and Robust Jackknife Ridge GM2 (RJGM2) estimators are put forward. Nevertheless, they are still not very efficient because they suffer from long computational running time, some elements of biased and do not have bounded influence property. This paper proposes a robust Jackknife ridge regression that integrates a generalized M estimator and fast improvised Gt (GM-FIMGT) estimator, in its establishment. We name this method the robust Jackknife ridge regression based on GM-FIMGT, denoted as RJFIMGT. The numerical results show that the proposed RJFIMGT method was found to be the best method as it has the least values of RMSE and bias compared to other methods in this study.
2023

An efficient method of identification of influential observation in multiple linear regression and its application to real data

2023-11
Sains Malaysiana (Issue : 12) (Volume : 52)
Influential observations (IOs) are those observations which either alone or together with several other observations have detrimental effect on the computed values of various estimates. As such, it is very important to detect their presence. Several methods have been proposed to identify IOs that include the fast improvised influential distance (FIID). The FIID method has been shown to be more efficient than some existing methods. Nonetheless, the shortcoming of the FIID method is that, it is computationally not stable, still suffers from masking and swamping effects, time consuming issues and not using proper cut-off point. As a solution to this problem, a new robust version of influential distance method (RFIID) which is based on Reweighted Fast Consistent and High Breakdown (RFCH) estimator is proposed. The results of real data and Monte Carlo simulation study indicate that the RFIID able to correctly separate the IOs from the rest of data with the least computational running times, least swamping effect and no masking effect compared to the other methods in this study

Detection of Outlier in Time Series with Application to Dohuk Dam Using the SCA Statistical System

2023-08
General Letters in Mathematics (GLM) (Issue : 2) (Volume : 13)
Outliers are data points or observations that stand out significantly from the rest of the group in terms of size or frequency. They are also referred to as "abnormal data". Before fitting a forecasting model, outliers are often eliminated from the data set, or if not removed, the forecasting model is altered to account for the presence of outliers. The first scenario covered in the study is the detection of outliers when the parameters have been established. Second, where there are unidentified parameters. This article mentions a number of causes for outlier correction and detection in time series analysis and forecasting. For the objective of the study, a time series of the volume of water entering the Dohuk dam reservoir in Dohuk city was used. The study arrived at the following conclusions after conducting their research: first, whenever the critical value increased, the value of residual standard error (with outlier adjustment) increased. Second, the quantity of outlier values dropped each time the critical value was raised. Third, forecasts with outlier correction perform better than forecasts without outlier adjustment when outliers are present.

A CLASSIFICATION OF OUTLIERS IN TRANSFORMED VARIABLES

2023-04
Journal of University of Duhok (Issue : 1) (Volume : 28)
The diagnostic of outliers is very essential since of their responsibility for producing large interpretative problems in linear regression analysis and nonlinear regression analysis. There has been a lot of work accomplished in identifying outliers in linear but not in nonlinear regression. In practice, it is often the case that the assumption of linear regression is violated, such as when highly influential outliers exist in the dataset, which will adversely impact the validity of the statistical analysis. Finding outliers is important because they are responsible for invalid inferences and inaccurate predictions as they have a larger impact on the computed values of various estimations. The outliers must be divided into vertical outliers (VO), good leverage points (GLP), and bad leverage points (BLP) since only the vertical outliers and bad leverage have an undue effect on parameter estimations. We compare several outlier detection techniques using a robust diagnostic plot to correctly classify good and bad leverage points and vertical outliers, by decreasing both masking and swamping effects for both the untransformed variables and transformed variables. The main idea is to detect of outliers before transformation (original data) and after transformation. The results of generation study and numerical indicate that modified generalized DIFFITS (different of fit) against the Diagnostic Robust Generalized Potential (MGDFF-DRGP) successfully detect outliers in the data.
2022

K-Nearest Neighbor Method with Principal Component Analysis for Functional Nonparametric Regression

2022-11
Baghdad Science Journal (Issue : 6) (Volume : 19)
This paper proposed a new method to study functional non-parametric regression data analysis with conditional expectation in the case that the covariates are functional and the Principal Component Analysis was utilized to de-correlate the multivariate response variables. It utilized the formula of the Nadaraya Watson estimator (K-Nearest Neighbour (KNN)) for prediction with different types of the semi-metrics, (which are based on Second Derivative and Functional Principal Component Analysis (FPCA)) for measureing the closeness between curves. Root Mean Square Errors is used for the implementation of this model which is then compared to the independent response method. R program is used for analysing data. Then, when the covariates are functional and the Principal Component Analysis was utilized to de-correlate the multivariate response variables model, results are more preferable than the independent response method. The models are demonstrated by both a simulation data and real data

Estimating Regression Coefficients using Robust Bootstrap with application to Covid-19 Data

2022-08
General Letters in Mathematics (Issue : 2) (Volume : 12)
The linear regression model is often used by researchers and data analysts for predictive, descriptive, and inferential purposes. When working with empirical data, this model is based on a set of assumptions that are not always satisfied. In this situation, using more complicated regression algorithms that do not strictly rely on the same assumptions might be one answer. Nevertheless, transformations provide a simpler technique for improving the validity of model assumptions and allow the user to continue using the well-known model of linear regression. The main objective of this project is to provide a transformation for the linear model’s response and predictor variables, as well as parameter estimation methods before the transformation and after the transformation. The bootstrap approach has been effectively used for many statistical estimates and inference issues, according to the paper.

A Nonlinear Transformation Methods Using Covid-19 Data in the Kurdistan Region

2022-04
2022 International Conference on Computer Science and Software Engineering (CSASE) (Issue : 6)
Ordinary Least squares (OLS) are the most widely used due to tradition and their optimal properties to estimate the parameters of linear and nonlinear regression models. Nevertheless, in the presence of outliers in the data, estimates of OLS become inefficient, and even a single unusual point can have a significant impact on the estimation of parameters. In the presence of outliers is the use of robust estimators rather than the method of OLS. They are finding a suitable nonlinear transformation to reduce anomalies, including non-additivity, heteroscedasticity, and non-normality in multiple nonlinear regression. It might be beneficial to transform the response variable or predictor variable, or both together to present the equation in a simple, functional form that is linear in the transformed variables. To illustrate the superior transformation function, we compare the squared correlation coefficient (coefficient of …
2021

Robust Multicollinearity Diagnostic Measure For Fixed Effect Panel Data Model

2021-10
Malaysian J. Fundam. Appl. Sci. (Issue : 5) (Volume : 17)
It is now evident that high leverage points (HLPs) can induce the multicollinearity pattern of a data in fixed effect panel data model. Those observations that are responsible for this phenomenon are called high leverage collinearity-enhancing observations (HLCEO). The commonly used within group ordinary least squares (WOLS) estimator for estimating the parameters of fixed effect panel data model is easily affected by HLCEOs. In their presence, the WOLS estimates may produce large variances and this would lead to erroneous interpretation. Therefore, it is imperative to detect the multicollinearity which is caused by HLCEOs. The classical Variance Inflation Factor (CVIF) is the commonly used diagnostic method for detecting multicollinearity in panel data. However, it is not correctly diagnosed multicollinearity in the presence of HLCEOs. Hence, in this paper three new robust diagnostic methods of diagnosing multicollinearity in panel data are proposed, namely the RVIF (WGM-FIMGT), RVIF (WGMDRGP) and RVIF (WMM) and compared their performances with the CVIF. The numerical evidences show that the CVIF incorrectly diagnosed multicollinearity but our proposed methods correctly diagnosed no multicollinearity in the presence of HLCEOs where RVIF (WGM-FIMGT) being the best method as it has the least computational running time.

Fast Improvised Influential Distance for the Identification of Influential Observations in Multiple Linear Regression

2021-08
Sains Malaysiana (Issue : 7) (Volume : 50)
Influential observations (IO) are those observations that are responsible for misleading conclusions about the fitting of a multiple linear regression model. The existing IO identification methods such as influential distance (ID) is not very successful in detecting IO. It is suspected that the ID employed inefficient method with long computational running time for the identification of the suspected IO at the initial step. Moreover, this method declares good leverage observations as IO, resulting in misleading conclusion. In this paper, we proposed fast improvised influential distance (FIID) that can successfully identify IO, good leverage observations, and regular observations with shorter computational running time. Monte Carlo simulation study and real data examples show that the FIID correctly identify genuine IO in multiple linear regression model with no masking and a negligible swamping rate.

Simple and Fast Generalized - M (GM) Estimator and Its Application to Real Data Set

2021-08
J. of university of anbar for pure science (Issue : 3) (Volume : 7)
It is now evident that some robust methods such as MM-estimator do not address the concept of bounded influence function, which means that their estimates still be affected by outliers in the X directions or high leverage points (HLPs), even though they have high efficiency and high breakdown point (BDP). The Generalized M(GM) estimator, such as the GM6 estimator is put forward with the main aim of making a bound for the influence of HLPs by some weight function. The limitation of GM6 is that it gives lower weight to both bad leverage points (BLPs) and good leverage points (GLPs) which make its efficiency decreases when more GLPs are present in a data set. Moreover, the GM6 takes longer computational time. In this paper, we develop a new version of GM-estimator which is based on simple and fast algorithm. The attractive feature of this method is that it only downs weights BLPs and vertical outliers (VOs) and increases its efficiency. The merit of our proposed GM estimator is studied by simulation study and well-known aircraft data set.
2018

ROBUST WITHIN GROUP ESTIMATOR FOR FIXED EFFECT PANEL DATA

2018-08
Pak. J. Statist. (Issue : 4) (Volume : 34)
In the presence of outliers, panel data estimators can be extremely biased. In this work we used mm-centering to provide robust solutions to the Within Group parameter estimates. The main contribution of this article is to propose a new version of the Generalized M-estimator (GM) that provides good resistance against bad leverage points. The advantage of this method over the existing methods is that it only minimizes the bad leverage points and outliers. The good leverage points are not down weighted, and this increases the efficiency of this estimator. The effectiveness of the proposed estimator is investigated using real and simulated data sets.

Fast improvised diagnostic robust measure for the identification of high leverage points in multiple linear regression

2018-08
journal of statistics and management systems (Issue : 6) (Volume : 21)
High leverage points in the data set have massive effect in linear regression. Identifications of high leverage points (HLP) is very important due to their effect of causing wrong conclusion in regression model. The Diagnostic Robust Generalized Potential based on Index Set Equality IDRGP (ISE) is very successful in identifying high leverage points with less running time. However, it still reveals smaller rate of masking and swamping. A fast Improvised method for the identification of High Leverage Point is proposed to reduce the effect of swamping and masking at a faster rate of running time. The results of the simulation study show the merit of our proposed method.

The Performance of fast robust Variance Inflation Factor in the presence of high leverage points

2018-08
Journal of Engineering and Applied Sciences (Issue : 16) (Volume : 13)
The detection of multicollinearity is very crucial so that proper remedial measures can be taken up in their presence. The widely used diagnostic method to detect multicollinearity in multiple linear regression is by using Classical Variance Inflation Factor (CVIF). It is now evident that the CVIF failed to correctly detect multicollinearity when high leverage points are present in a set of data. Robust Variance Inflation Factor (RVIF) has been introduced to remedy this problem. Nonetheless, the computation of RVIF takes longer time because it is based on robust GM (DRGP) estimator which depends on Minimum Volume Ellipsoid (MVE) estimator that involves a lot of computer times.. In this paper, we propose a fast RVIF (FRVIF) which take less computing time. The results of the simulation study and numerical examples indicate that our proposed FRVIF successfully detect multicollinearity problem with faster rate compared to other methods.
2014

Studying the Scientific State Of Students Using the Adjusted Residuals

2014-08
Journal of Global Research in Mathematical Archives (Issue : 2) (Volume : 2)
The aim of this article is to apply the adjusted residuals to analysis of (two-way) contingency tables to determine the cells which affected to the significance of chi-square statistic

SIGNIFICANT FACTORS TO AFFECT THE BLOOD PRESSURE

2014-08
International Journal of Advances in Engineering & Technology (Issue : 2) (Volume : 7)
The stepwise regression and other method to know the best method when the model contains intercept and without intercept, and applying the method leverage point when we added the new point to the original data. We testing the significant intercept by using (F,AIC and Cp) test.

Testing Feedback by Using F Test

2014-08
Global Research in Mathematical Archives (Issue : 2) (Volume : 2)
The present study deals with the detection of feedback between rainfall as an output series and temperature as an input series. Two methods were used. The first method is testing of cross-correlation function between pre whiten of the input and output series. The second method was used to test F between two series of input and output. The results shown that there is no feedback between input and output series presented in this study as that there, where it has founded cross-correlation between prewhiten of the input series ( at) and residual series (at) .

Studying the Scientific State Of Students Using the Adjusted Residuals

2014-08
Mathematical Theory and Modeling (Issue : 4) (Volume : 4)
The aim of this article is to apply the adjusted residuals to analysis of (two-way) contingency tables to determine the cells which affected to the significance of chi-square statistic.
2013

ON forecasting by Dynamic Regression models

2013-08
J. of university of anbar for pure science (Issue : 3) (Volume : 7)
This research include the application of some statistical technique for studying the time series of the average monthly humidity as an output series with one of the variables which affect on it, which is the series of the average monthly relative rainfall as an input which is measured at the meteorological station of Duhok the techniques used are the modeling by an (ARIMA) model as well as the dynamic regression model. So that the perfect dynamic regression model selected was suitable for determining the future forecasting values.
2001

Fast Improvised Influential Distance for the Identification of Influential Observations in Multiple Linear Regression

2001-07
Sains Malaysiana (Issue : 7) (Volume : 50)
Influential observations (IO) are those observations that are responsible for misleading conclusions about the fitting of a multiple linear regression model. The existing IO identification methods such as influential distance (ID) is not very successful in detecting IO. It is suspected that the ID employed inefficient method with long computational running time for the identification of the suspected IO at the initial step. Moreover, this method declares good leverage observations as IO, resulting in misleading conclusion. In this paper, we proposed fast improvised influential distance (FIID) that can successfully identify IO, good leverage observations, and regular observations with shorter computational running time. Monte Carlo simulation study and real data examples show that the FIID correctly identify genuine IO in multiple linear regression model with no masking and a negligible swamping rate.

Back