Chapter 3 Linear Regression, Q14
set.seed(1)
x1 = runif(100)
x2 = 0.5*x1+rnorm(100)/10
y = 2+2*x1+0.3*x2+rnorm(100)
y = 2 + 2*x1 + 0.3*x2 + error
beta0 = 2 , beta1 = 2 , beta2 = 0.3
cor(x1,x2)
plot(x1,x2)
lm.model1 = lm(y~x1+x2)
summary(lm.model1)
The estimates for the coefficiets are as following.
beta0=2.13 , beta1 = 1.4396 , beta2 = 1.0097
Only beta0 estimate is a good estimation of the actual value of beta0. The estimates for beta1 and beta2 are not very accurate.
The p-value of the t-statistic of beta1 is slightly smaller than the 5% critical point. We can reject the hypothesis that beta1 = 0.
However the p-value of the t-statistic of beta2 is larger than the 5% critical point. Therefore we can't reject the null hypothesis that beta2=0.
lm.model2 = lm(y~x1)
summary(lm.model2)
The standard error of the estimate of beta1 has decreased.
The p-value associated with the t-statistic of beta1 is near zero. Therefore we can reject the null hypothesis that beta1=0.
lm.model3 = lm(y~x2)
summary(lm.model3)
The standard error of the estimate of beta2 has decreased.
The p-value associated with the t-statistic of beta2 is near zero. There we can reject the null hypothesis that beta2 = 0.
The results in (c) and (e) contradict each other. According to the results in (c) we couldn't reject the null hypothesis that beta2=0. However the results in (e) falsifies the null hypothesis. We are getting such results because of the high colinearity between x1 and x2. The high colinearity increases the standard error of the estimates of coefficients which leads to a decline in the t-statistic value and hence we can't reject the null hypothesis. Only after removing one of the predictors from the model we get the correct p-value associated with the coefficient.
x1 = c(x1, 0.1)
x2 = c(x2, 0.8)
y = c(y,6)
lm.model_g = lm(y~x1+x2)
summary(lm.model_g)
par(mfrow=c(2,2))
plot(lm.model_g)
p=2
n = length(y)
(p+1)/n
Effect of the new observation on the model in (c):
According to the Studentized residuals vs Leverage graph the newly added observation with index number 101 has a leverage statistic of 0.4 which greatly exceeds (p+1)/n. Therefore this observation has a high leverage. There is no outlier as Studentized residuals vs Fitted values graph is bounded between -2 and 2.
After the addition of the new observation the coefficient of x1 becomes significant and the coefficient of x2 becomes insignificant.
lm.model_g2 = lm(y~x1)
summary(lm.model_g2)
par(mfrow=c(2,2))
plot(lm.model_g2)
par(mfrow=c(1,1))
plot(predict(lm.model_g2),rstudent(lm.model_g2))
Effect of the new observation on the model in (d):
There is no high leverage point in data. In the Studentized residuals vs Fitted value graph the observation which is not bounded by -3 and 3 is an outlier.
The R-squared value has decreased.
lm.model_g3 = lm(y~x2)
summary(lm.model_g3)
par(mfrow=c(2,2))
plot(lm.model_g3)
par(mfrow=c(1,1))
plot(predict(lm.model_g3),rstudent(lm.model_g3))
Effect of the observation on the model in (e):
Studentized residuals vs leverage graph shows that there is one observation with a high leverage.
The Studentized residuals vs Fitted values graph shows that all the observations are bounded between -3 and 3 so there are no outliers.