Chapter 3 Linear Regression Question 11

In [2]:
set.seed(1)
x=rnorm(100)
y=2*x+rnorm(100)

(a)

In [3]:
lm.model = lm(y~x+0)
summary(lm.model)
Call:
lm(formula = y ~ x + 0)

Residuals:
    Min      1Q  Median      3Q     Max 
-1.9154 -0.6472 -0.1771  0.5056  2.3109 

Coefficients:
  Estimate Std. Error t value Pr(>|t|)    
x   1.9939     0.1065   18.73   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.9586 on 99 degrees of freedom
Multiple R-squared:  0.7798,	Adjusted R-squared:  0.7776 
F-statistic: 350.7 on 1 and 99 DF,  p-value: < 2.2e-16
In [4]:
plot(x,y)
abline(lm.model)
In [5]:
summary.lm(lm.model)$coefficients
EstimateStd. Errort valuePr(>|t|)
x1.993876 0.1064767 18.72593 2.642197e-34

The p-value of the t-statistic of the model is near zero. This indicates that the estimated coefficient is statistically significant and the null hypothesis can be rejected. The coefficient is almost equal to the actual value of the coefficient.

(b)

In [6]:
lm.model_x_y = lm(x~y+0)
summary(lm.model_x_y)
Call:
lm(formula = x ~ y + 0)

Residuals:
    Min      1Q  Median      3Q     Max 
-0.8699 -0.2368  0.1030  0.2858  0.8938 

Coefficients:
  Estimate Std. Error t value Pr(>|t|)    
y  0.39111    0.02089   18.73   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.4246 on 99 degrees of freedom
Multiple R-squared:  0.7798,	Adjusted R-squared:  0.7776 
F-statistic: 350.7 on 1 and 99 DF,  p-value: < 2.2e-16
In [7]:
summary.lm(lm.model_x_y)$coefficients
EstimateStd. Errort valuePr(>|t|)
y0.3911145 0.02088625 18.72593 2.642197e-34

The p-value of the t-statistic is near zero. Hence we can reject the null hypothesis.

(c)

Both (a) and (b) estimate the same model in two different forms. (a)y=2*x+error (b)x = 0.5(y-error)

(d)

$\beta = \frac{\sum xy}{\sum x^{2}} \hspace{1cm} SE(\beta )=\sqrt{\frac{\sum (y-x\beta )^{2}}{(n-1)\sum x^{2}}} \\ t-statistic = \frac{\beta }{SE(\beta)} = \frac{\sum xy}{\sum x^{2}} * \sqrt{\frac{(n-1)\sum x^{2}}{\sum (y-x\beta )^{2}}} \\ = \sqrt{(\frac{\sum xy}{\sum x^{2}})^{2}} * \sqrt{\frac{(n-1)\sum x^{2}}{\sum (y-x\beta )^{2}}} \\ = \sqrt{\frac {(\sum xy)^2 (n-1) \sum x^2}{(\sum x^{2})^2 \sum (y^{2}-2xy\beta+x^2 \beta)}} \\ = \sqrt{\frac {(\sum xy)^2 (n-1)}{\sum x^{2} \sum (y^{2}-2xy\beta+x^2 \beta)}} \\ = \sqrt{\frac {(\sum xy)^2 (n-1)}{\sum x^{2} (\sum y^{2} -2\beta\sum xy + \beta^{2} \sum x^{2})}} \\ = \sqrt{\frac {(\sum xy)^2 (n-1)}{\sum x^{2} (\sum y^{2} -2\frac{\sum xy}{\sum x^{2}}\sum xy + (\frac{\sum xy}{\sum x^{2}})^{2} \sum x^{2})}} \\ = \sqrt{\frac {(\sum xy)^2 (n-1)}{\sum x^{2} \sum y^{2}-2(\sum xy)^{2} + (\sum xy)^{2}}} \\ = \sqrt{\frac {(\sum xy)^2 (n-1)}{\sum x^{2} \sum y^{2}-(\sum xy)^{2}}} \\ = \frac {\sqrt{(n-1)}\sum xy}{\sqrt{\sum x^{2} \sum y^{2}-(\sum xy)^{2}}}$

In [13]:
n = length(x)
X = matrix(data=x, nrow=n, ncol=1)
Y = matrix(data=y, nrow=n, ncol=1)
t = (sqrt(n-1)*sum(X*Y))/sqrt(sum(X^2)*sum(Y^2)-(sum(X*Y))^2)
print(t)
[1] 18.72593

the t-statistic calculated by the formula is the same as the t-statistic shown in the summary of the model

(e)

According to the t-statistic formula given in (d) if you swap the values of x and y the t-statistic value wouldn't change.

(sqrt(n-1)sum(XY))/sqrt(sum(X^2)sum(Y^2)-(sum(XY))^2) = (sqrt(n-1)sum(YX))/sqrt(sum(Y^2)sum(X^2)-(sum(YX))^2)

(f)

In [14]:
lm.model1 = lm(y~x)
summary(lm.model1)
Call:
lm(formula = y ~ x)

Residuals:
    Min      1Q  Median      3Q     Max 
-1.8768 -0.6138 -0.1395  0.5394  2.3462 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -0.03769    0.09699  -0.389    0.698    
x            1.99894    0.10773  18.556   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.9628 on 98 degrees of freedom
Multiple R-squared:  0.7784,	Adjusted R-squared:  0.7762 
F-statistic: 344.3 on 1 and 98 DF,  p-value: < 2.2e-16
In [15]:
lm.model2 = lm(x~y)
summary(lm.model2)
Call:
lm(formula = x ~ y)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.90848 -0.28101  0.06274  0.24570  0.85736 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.03880    0.04266    0.91    0.365    
y            0.38942    0.02099   18.56   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.4249 on 98 degrees of freedom
Multiple R-squared:  0.7784,	Adjusted R-squared:  0.7762 
F-statistic: 344.3 on 1 and 98 DF,  p-value: < 2.2e-16

In both regression of x onto y and y onto x the t-statistic for Beta1 is 18.56