A New Two Parameter Biased Estimator for the Unrestricted Linear Regression Model: Theory, Simulation and Application

This paper proposes a new biased estimator for estimating the regression parameters for the multiple linear regression models when the regressors are correlated. Theoretical comparisons and simulations results show that, the proposed estimator performs better than other existing estimators under some conditions in the smaller mean squares error sense. A real-life dataset is analyzed to illustrate the findings of the paper.

The main objective of this paper is to propose a new two parameter biased estimator for the regression coefficients and then to compare the performance of the new estimator with the OLS, the ordinary ridge regression (ORR) of Hoerl and Kennard [1], the Liu of Liu [6] and the Kibria-Lukman (KL) of Kibria and Lukman [8] estimators. The rest of the paper is organized as follows: Some estimators and their statistical properties are given in Section 2. In section 3, the theoretical comparisons among the proposed estimator and existing estimators and the biasing parameters k and d are given. A Monte Carlo simulation study is performed in section 4. A real-life data are analyzed in section 5. Finally, Some Conclusions are given in section 6.

The Model and Estimators
Consider the following linear regression model: , y Xβ ε = + (2.1) where y is an n × 1 vector of the response variable, X is a known n × p full rank matrix of the explanatory variables, β is an p × 1 vector of unknown regression coefficients, ε is a n × 1 vector of disturbance assumed to be distributed with mean vector 0 and variance covariance matrix σ 2 I, and I is an identity matrix of order n × n. To define various estimators, canonical form of the model (2.1) is given by: where, Z = XC, α = C'β, and C is an orthogonal matrix such that

Introduction
The ordinary least squares (OLS) estimator is the best linear unbiased estimator and has been used to estimate the parameters of the linear regression model since its inception. One of the important assumptions for the linear regression model is that the regressors (independent variables) are independent. However, in practice the regressors may or may not be independent, which causes the problem of multicollinearity. In the presence of the multicollinearity, the OLS estimator is inefficient and gives wrong sign of the parameters in the multiple linear regression models [1]. To handle these problems, many authors have given different types of estimators, to mention a few, [1][2][3][4][5][6][7], and recently [8], for one parameter biased estimator. However, to refer for two parameters mode, the following authors are notable, [9][10][11][12][13][14][15], among others.
Then, the OLS estimator of α is given as: and then the mean squared error matrix (MSEM) of α is given by If ˆo pt d is negative, Ozkale and Kaciranlar [9] adopt the alternative biasing parameter below: The new biased (NB) estimator of α is obtained by minimizing ( Here, k and d are the Lagrangian multipliers. The solution to (2.15) gives the new estimator as follows: The proposed NB estimator is a general estimator which includes the OLS and the KL estimator: The MSEM of the proposed NB estimator of α is given by The lemmas below will be used for theoretical comparisons among estimators in the next section.
if and only if

Comparison among the Estimators
Comparison between α and ˆN B α Theorem 3.1: Proof: The proof is completed.

Comparison between ˆd
α and ˆN B α Theorem 3.3: where, where, The proof is completed.
So the optimal values of k and d for the proposed NB estimator is going to be found. At first, by minimizing the equation m, we obtain the optimal value of k when d is fixed aŝˆ( Differentiating m with respect to k and setting ( ) 0, Then, the estimated optimal value of k is given as follows: and, 2 min Also, the optimal value of d will be found by differentiating m with respect to d when k is fixed and setting Then, the estimated optimal d with the unbiased estimators is given by The parameters k and d estimators selection in ˆN B α are found iteratively as follows: Find an initial estimate of d using Estimate min ( ) d NB in (3.15) by using min ( ) k NB in step 2.

Simulation Study
A Monte Carlo simulation study is performed to show the performance of the NB estimator over some existing estimators. It contains two parts: (i) Simulation technique and (ii) Results discussion.

Simulation technique
Using the equation below, we generate the explanatory variables (see, Gibbons [24] and Kibria [19]): where z ij are independent standard normal pseudo-random numbers and ρ is the correlation between any two explanatory variables and ρ here has two values 0.9 and 0.99. The n observations for the response variable y are gotten by the following equation: where * ij α is the estimator and i α is the true parameter. The estimated MSEs of the estimators are shown in Table 1, Table 2, Table 3 and Table 4 for (ρ = 0.90 and n = 50), (ρ = 0.99 and n = 50), (ρ = 0.90 and n = 100), and (ρ = 0.99 and n = 100), respectively.

Simulation results discussions
From Table 1, Table 2, Table 3 and Table 4, we observed that, when the factors σ and ρ are going to increase, the estimated MSE values are also going to increase, while n is going to increase, the estimated MSE values are going to decrease. Also, the OLS estimator is performing the worst for all cases in presence of the multicollinearity. Moreover, the simulation results show that, the proposed NB estimator is performing better than the other estimators for most of cases. The Liu estimator gives better results in the MSE values when k = d = 0.1, 0.2 i.e. when the biasing parameters are near to zero. For ρ = 0.9: The condition number (CN) is approximately around 5 and the variance inflation factors (VIFs) are around 4 to 6 such that it is observed that a close agreement of the proposed NB (although better results) with the KL estimator for low values of k = d and a better performance as k = d increases for a fixed value of σ. This improvement increases with the increase of the value of σ and an even better improvements is observed if ρ = 0.99 where the CN and the VIFs become larger and are approximately around 15 and around 36 to 61, respectively. So, the proposed NB estimator works better for the strong correlated explanatory variables. So, the performance of the proposed NB estimator almost depends on the value of ρ, σ, the biasing parameters k and d, and the true parameter. Thus, simulation results are consistent with the theoretical results.

Application
To illustrate the theoretical and simulation results of this paper, we consider a real life data in this section. The Portland cement data was originally adopted by Woods, et al. [28]. This data was also analyzed by many researchers, for examples, [29,30]; Lukman, et al. [14] and recently by Kibria and Lukman [8], among others. And this data is analyzed here to explain the performance of the proposed NB estimator and the other existing estimators. The    The correlation coefficients matrix of the explanatory variables are presented in Table 5 such that there is a significant and strong relationship among the following explanatory variables: X 1 and X 3, X 2 and X 4 . Then, the estimated parameters and the MSE values of the estimators are presented in Table 6. It appears from Table 6 that the proposed NB estimator is performing the best where it is giving an obvious improvement over all existing estimators and a little improvement over the KL estimator in which this is consistent with the simulation results because σ here is small and not all the explanatory variables are significantly or strongly correlated at the same degree of correlation even though the data has high CN and VIFs. Note: * Correlation is significant at 0.05.

Some Concluding Remarks
In this paper, we proposed a new biased (NB) estimator for handling the multicollinearity problem in the multiple linear regression models. Some existing estimators are the special case of the proposed estimator. The proposed NB estimator is compared theoretically with the Ordinary least squares (OLS) estimator, the Ordinary ridge regression (ORR) estimator, the Liu estimator and the Kibria-Lukman (KL) estimator, and then the biasing parameters d and k of the NB estimator are derived. A Monte Carlo simulation study is performed for comparing the performance of the OLS, ORR, Liu, KL and the proposed NB estimators. The main finding of this simulation is that the proposed NB estimator performed better than the above mentioned estimators under some conditions. A real-life data is analyzed to support the findings of the paper.