Published Journal Articles
2024
A Remedial Measure of Multicollinearity in Multiple Linear Regression in the Presence of High Leverage Points
2024-04
Sains Malaysiana (Issue : 4) (Volume : 53)
The ordinary least squares (OLS) is the widely used method in multiple linear regression model due to tradition
and its optimal properties. Nonetheless, in the presence of multicollinearity, the OLS method is inefficient because
the standard errors of its estimates become inflated. Many methods have been proposed to remedy this problem that
include the Jackknife Ridge Regression (JAK). However, the performance of JAK is poor when multicollinearity and
high leverage points (HLPs) which are outlying observations in the X- direction are present in the data. As a solution
to this problem, Robust Jackknife Ridge MM (RJMM) and Robust Jackknife Ridge GM2 (RJGM2) estimators are put
forward. Nevertheless, they are still not very efficient because they suffer from long computational running time,
some elements of biased and do not have bounded influence property. This paper proposes a robust Jackknife ridge
regression that integrates a generalized M estimator and fast improvised Gt (GM-FIMGT) estimator, in its establishment.
We name this method the robust Jackknife ridge regression based on GM-FIMGT, denoted as RJFIMGT. The numerical
results show that the proposed RJFIMGT method was found to be the best method as it has the least values of RMSE
and bias compared to other methods in this study.
2023
An efficient method of identification of influential observation in multiple linear regression and its application to real data
2023-11
Sains Malaysiana (Issue : 12) (Volume : 52)
Influential observations (IOs) are those observations which either alone or together with several other observations
have detrimental effect on the computed values of various estimates. As such, it is very important to detect their
presence. Several methods have been proposed to identify IOs that include the fast improvised influential distance
(FIID). The FIID method has been shown to be more efficient than some existing methods. Nonetheless, the shortcoming
of the FIID method is that, it is computationally not stable, still suffers from masking and swamping effects, time
consuming issues and not using proper cut-off point. As a solution to this problem, a new robust version of influential
distance method (RFIID) which is based on Reweighted Fast Consistent and High Breakdown (RFCH) estimator is
proposed. The results of real data and Monte Carlo simulation study indicate that the RFIID able to correctly separate
the IOs from the rest of data with the least computational running times, least swamping effect and no masking effect
compared to the other methods in this study
Detection of Outlier in Time Series with Application to Dohuk Dam Using the SCA Statistical System
2023-08
General Letters in Mathematics (GLM) (Issue : 2) (Volume : 13)
Outliers are data points or observations that stand out significantly from the rest of the group in terms of size or
frequency. They are also referred to as "abnormal data". Before fitting a forecasting model, outliers are often
eliminated from the data set, or if not removed, the forecasting model is altered to account for the presence of
outliers. The first scenario covered in the study is the detection of outliers when the parameters have been
established. Second, where there are unidentified parameters. This article mentions a number of causes for outlier
correction and detection in time series analysis and forecasting. For the objective of the study, a time series of the
volume of water entering the Dohuk dam reservoir in Dohuk city was used. The study arrived at the following
conclusions after conducting their research: first, whenever the critical value increased, the value of residual
standard error (with outlier adjustment) increased. Second, the quantity of outlier values dropped each time the
critical value was raised. Third, forecasts with outlier correction perform better than forecasts without outlier
adjustment when outliers are present.
A CLASSIFICATION OF OUTLIERS IN TRANSFORMED VARIABLES
2023-04
Journal of University of Duhok (Issue : 1) (Volume : 28)
The diagnostic of outliers is very essential since of their responsibility for producing large
interpretative problems in linear regression analysis and nonlinear regression analysis. There has been a
lot of work accomplished in identifying outliers in linear but not in nonlinear regression. In practice, it is
often the case that the assumption of linear regression is violated, such as when highly influential outliers
exist in the dataset, which will adversely impact the validity of the statistical analysis. Finding outliers is
important because they are responsible for invalid inferences and inaccurate predictions as they have a
larger impact on the computed values of various estimations. The outliers must be divided into vertical
outliers (VO), good leverage points (GLP), and bad leverage points (BLP) since only the vertical outliers
and bad leverage have an undue effect on parameter estimations. We compare several outlier detection
techniques using a robust diagnostic plot to correctly classify good and bad leverage points and vertical
outliers, by decreasing both masking and swamping effects for both the untransformed variables and
transformed variables. The main idea is to detect of outliers before transformation (original data) and
after transformation. The results of generation study and numerical indicate that modified generalized
DIFFITS (different of fit) against the Diagnostic Robust Generalized Potential (MGDFF-DRGP)
successfully detect outliers in the data.
2022
K-Nearest Neighbor Method with Principal Component Analysis for Functional Nonparametric Regression
2022-11
Baghdad Science Journal (Issue : 6) (Volume : 19)
This paper proposed a new method to study functional non-parametric regression data analysis with
conditional expectation in the case that the covariates are functional and the Principal Component Analysis
was utilized to de-correlate the multivariate response variables. It utilized the formula of the Nadaraya
Watson estimator (K-Nearest Neighbour (KNN)) for prediction with different types of the semi-metrics,
(which are based on Second Derivative and Functional Principal Component Analysis (FPCA)) for
measureing the closeness between curves. Root Mean Square Errors is used for the implementation of this
model which is then compared to the independent response method. R program is used for analysing data.
Then, when the covariates are functional and the Principal Component Analysis was utilized to de-correlate
the multivariate response variables model, results are more preferable than the independent response method.
The models are demonstrated by both a simulation data and real data
Estimating Regression Coefficients using Robust Bootstrap with application to Covid-19 Data
2022-08
General Letters in Mathematics (Issue : 2) (Volume : 12)
The linear regression model is often used by researchers and data analysts for predictive, descriptive, and inferential
purposes. When working with empirical data, this model is based on a set of assumptions that are not always satisfied. In this
situation, using more complicated regression algorithms that do not strictly rely on the same assumptions might be one answer.
Nevertheless, transformations provide a simpler technique for improving the validity of model assumptions and allow the user
to continue using the well-known model of linear regression. The main objective of this project is to provide a transformation
for the linear model’s response and predictor variables, as well as parameter estimation methods before the transformation and
after the transformation. The bootstrap approach has been effectively used for many statistical estimates and inference issues,
according to the paper.
A Nonlinear Transformation Methods Using Covid-19 Data in the Kurdistan Region
2022-04
2022 International Conference on Computer Science and Software Engineering (CSASE) (Issue : 6)
Ordinary Least squares (OLS) are the most widely used due to tradition and their optimal properties to estimate the parameters of linear and nonlinear regression models. Nevertheless, in the presence of outliers in the data, estimates of OLS become inefficient, and even a single unusual point can have a significant impact on the estimation of parameters. In the presence of outliers is the use of robust estimators rather than the method of OLS. They are finding a suitable nonlinear transformation to reduce anomalies, including non-additivity, heteroscedasticity, and non-normality in multiple nonlinear regression. It might be beneficial to transform the response variable or predictor variable, or both together to present the equation in a simple, functional form that is linear in the transformed variables. To illustrate the superior transformation function, we compare the squared correlation coefficient (coefficient of …
2021
Robust Multicollinearity Diagnostic Measure For Fixed Effect Panel Data Model
2021-10
Malaysian J. Fundam. Appl. Sci. (Issue : 5) (Volume : 17)
It is now evident that high leverage points (HLPs) can induce the multicollinearity pattern of a data in fixed effect panel data model. Those observations that are responsible for this phenomenon are called high leverage collinearity-enhancing observations (HLCEO). The commonly used within group ordinary least squares (WOLS) estimator for estimating the parameters of fixed effect panel data model is easily affected by HLCEOs. In their presence, the WOLS estimates may produce large variances and this would lead to erroneous interpretation. Therefore, it is imperative to detect the multicollinearity which is caused by HLCEOs. The classical Variance Inflation Factor (CVIF) is the commonly used diagnostic method for detecting multicollinearity in panel data. However, it is not correctly diagnosed multicollinearity in the presence of HLCEOs. Hence, in this paper three new robust diagnostic methods of diagnosing multicollinearity in panel data are proposed, namely the RVIF (WGM-FIMGT), RVIF (WGMDRGP) and RVIF (WMM) and compared their performances with the CVIF. The numerical evidences show that the CVIF incorrectly diagnosed multicollinearity but our proposed methods correctly diagnosed no multicollinearity in the presence of HLCEOs where RVIF (WGM-FIMGT) being the best method as it has the least computational running time.
Fast Improvised Influential Distance for the Identification of Influential Observations in Multiple Linear Regression
2021-08
Sains Malaysiana (Issue : 7) (Volume : 50)
Influential observations (IO) are those observations that are responsible for misleading conclusions about the fitting
of a multiple linear regression model. The existing IO identification methods such as influential distance (ID) is not
very successful in detecting IO. It is suspected that the ID employed inefficient method with long computational
running time for the identification of the suspected IO at the initial step. Moreover, this method declares good leverage
observations as IO, resulting in misleading conclusion. In this paper, we proposed fast improvised influential distance
(FIID) that can successfully identify IO, good leverage observations, and regular observations with shorter computational
running time. Monte Carlo simulation study and real data examples show that the FIID correctly identify genuine IO in
multiple linear regression model with no masking and a negligible swamping rate.
Simple and Fast Generalized - M (GM) Estimator and Its Application to Real Data Set
2021-08
J. of university of anbar for pure science (Issue : 3) (Volume : 7)
It is now evident that some robust methods such as MM-estimator do not address the concept of bounded influence
function, which means that their estimates still be affected by outliers in the X directions or high leverage points (HLPs),
even though they have high efficiency and high breakdown point (BDP). The Generalized M(GM) estimator, such as
the GM6 estimator is put forward with the main aim of making a bound for the influence of HLPs by some weight
function. The limitation of GM6 is that it gives lower weight to both bad leverage points (BLPs) and good leverage
points (GLPs) which make its efficiency decreases when more GLPs are present in a data set. Moreover, the GM6 takes
longer computational time. In this paper, we develop a new version of GM-estimator which is based on simple and fast
algorithm. The attractive feature of this method is that it only downs weights BLPs and vertical outliers (VOs) and increases
its efficiency. The merit of our proposed GM estimator is studied by simulation study and well-known aircraft data set.
2018
ROBUST WITHIN GROUP ESTIMATOR FOR FIXED EFFECT PANEL DATA
2018-08
Pak. J. Statist. (Issue : 4) (Volume : 34)
In the presence of outliers, panel data estimators can be extremely biased. In this work
we used mm-centering to provide robust solutions to the Within Group parameter
estimates. The main contribution of this article is to propose a new version of the
Generalized M-estimator (GM) that provides good resistance against bad leverage points.
The advantage of this method over the existing methods is that it only minimizes the bad
leverage points and outliers. The good leverage points are not down weighted, and this
increases the efficiency of this estimator. The effectiveness of the proposed estimator is
investigated using real and simulated data sets.
Fast improvised diagnostic robust measure for the identification of high leverage points in multiple linear regression
2018-08
journal of statistics and management systems (Issue : 6) (Volume : 21)
High leverage points in the data set have massive effect in linear regression. Identifications of high leverage points (HLP) is very important due to their effect of causing wrong conclusion in regression model. The Diagnostic Robust Generalized Potential based on Index Set Equality IDRGP (ISE) is very successful in identifying high leverage points with less running time. However, it still reveals smaller rate of masking and swamping. A fast Improvised method for the identification of High Leverage Point is proposed to reduce the effect of swamping and masking at a faster rate of running time. The results of the simulation study show the merit of our proposed method.
The Performance of fast robust Variance Inflation Factor in the presence of high leverage points
2018-08
Journal of Engineering and Applied Sciences (Issue : 16) (Volume : 13)
The detection of multicollinearity is very crucial so that proper remedial measures can be taken up in their presence. The widely used diagnostic method to detect multicollinearity in multiple linear regression is by using Classical Variance Inflation Factor (CVIF). It is now evident that the CVIF failed to correctly detect multicollinearity when high leverage points are present in a set of data. Robust Variance Inflation Factor (RVIF) has been introduced to remedy this problem. Nonetheless, the computation of RVIF takes longer time because it is based on robust GM (DRGP) estimator which depends on Minimum Volume Ellipsoid (MVE) estimator that involves a lot of computer times.. In this paper, we propose a fast RVIF (FRVIF) which take less computing time. The results of the simulation study and numerical examples indicate that our proposed FRVIF successfully detect multicollinearity problem with faster rate compared to other methods.
2014
Studying the Scientific State Of Students Using the Adjusted Residuals
2014-08
Journal of Global Research in Mathematical Archives (Issue : 2) (Volume : 2)
The aim of this article is to apply the adjusted residuals to analysis of (two-way) contingency tables to
determine the cells which affected to the significance of chi-square statistic
SIGNIFICANT FACTORS TO AFFECT THE BLOOD PRESSURE
2014-08
International Journal of Advances in Engineering & Technology (Issue : 2) (Volume : 7)
The stepwise regression and other method to know the best method when the model contains intercept and
without intercept, and applying the method leverage point when we added the new point to the original data. We
testing the significant intercept by using (F,AIC and Cp) test.
Testing Feedback by Using F Test
2014-08
Global Research in Mathematical Archives (Issue : 2) (Volume : 2)
The present study deals with the detection of feedback between rainfall as an output series and temperature as an input
series. Two methods were used. The first method is testing of cross-correlation function between pre whiten of the input and
output series. The second method was used to test F between two series of input and output. The results shown that there is no
feedback between input and output series presented in this study as that there, where it has founded cross-correlation between
prewhiten of the input series ( at) and residual series (at) .
Studying the Scientific State Of Students Using the Adjusted Residuals
2014-08
Mathematical Theory and Modeling (Issue : 4) (Volume : 4)
The aim of this article is to apply the adjusted residuals to analysis of (two-way) contingency tables to determine the cells which affected to the significance of chi-square statistic.
2013
ON forecasting by Dynamic Regression models
2013-08
J. of university of anbar for pure science (Issue : 3) (Volume : 7)
This research include the application of some statistical technique for studying the time series of the average monthly humidity as an output series with one of the variables which affect on it, which is the series of the average monthly relative rainfall as an input which is measured at the meteorological station of Duhok the techniques used are the modeling by an (ARIMA) model as well as the dynamic regression model. So that the perfect dynamic regression model selected was suitable for determining the future forecasting values.
2001
Fast Improvised Influential Distance for the Identification of Influential Observations in Multiple Linear Regression
2001-07
Sains Malaysiana (Issue : 7) (Volume : 50)
Influential observations (IO) are those observations that are responsible for misleading conclusions about the fitting of a multiple linear regression model. The existing IO identification methods such as influential distance (ID) is not very successful in detecting IO. It is suspected that the ID employed inefficient method with long computational running time for the identification of the suspected IO at the initial step. Moreover, this method declares good leverage observations as IO, resulting in misleading conclusion. In this paper, we proposed fast improvised influential distance (FIID) that can successfully identify IO, good leverage observations, and regular observations with shorter computational running time. Monte Carlo simulation study and real data examples show that the FIID correctly identify genuine IO in multiple linear regression model with no masking and a negligible swamping rate.
Back